Characterisation of mental health conditions in social media using Informed Deep Learning

TLDR

Mental illness prevalence is rising, increasing health and social care burden and reducing productivity and quality‑adjusted life‑years; while electronic health record NLP studies mental health at scale, clinician notes miss patients’ first‑hand experiences, and social media offers real‑time user‑generated content on well‑being and mental health. The study aimed to analyze Reddit posts and develop classifiers that recognize and classify mental‑illness–related content across 11 disorder themes. Using a neural network and deep learning approach on a balanced dataset of Reddit posts, the authors trained classifiers to detect mental‑illness–related posts and assign them to one of 11 disorder themes. The classifiers achieved 91.08 % accuracy in detecting mental‑illness posts and 71.37 % weighted‑average accuracy in theme classification, demonstrating a promising first step toward large‑scale content curation and targeted interventions.

Abstract

Abstract The number of people affected by mental illness is on the increase and with it the burden on health and social care use, as well as the loss of both productivity and quality-adjusted life-years. Natural language processing of electronic health records is increasingly used to study mental health conditions and risk behaviours on a large scale. However, narrative notes written by clinicians do not capture first-hand the patients’ own experiences, and only record cross-sectional, professional impressions at the point of care. Social media platforms have become a source of ‘in the moment’ daily exchange, with topics including well-being and mental health. In this study, we analysed posts from the social media platform Reddit and developed classifiers to recognise and classify posts related to mental illness according to 11 disorder themes. Using a neural network and deep learning approach, we could automatically recognise mental illness-related posts in our balenced dataset with an accuracy of 91.08% and select the correct theme with a weighted average accuracy of 71.37%. We believe that these results are a first step in developing methods to characterise large amounts of user-generated content that could support content curation and targeted interventions.

References

Page 1

	Year	Citations

Page 1