Detecting Arabic Depressed Users from Twitter Data

TLDR

Depression is a widespread mental health issue that impairs daily functioning, yet early diagnosis is challenging in Arabic‑speaking communities due to stigma and limited psychiatric awareness; social media posts now provide a promising avenue for surveillance, though most studies have focused on English data. This study is the first to investigate depressive emotions in Arabic Twitter users. We collected Arabic tweets from the Gulf region, labeled users as depressed or non‑depressed based on self‑diagnosis, and trained supervised classifiers (Random Forest, Naïve Bayes, AdaBoostM1, Liblinear) using features capturing clinical symptoms and online behaviors such as hashtag interactions and emoji usage. The Liblinear classifier achieved the highest accuracy of 87.5 % in distinguishing depressed from non‑depressed users.

Abstract

Depression is one of the most common health issues impacting the world. People with severe depression symptoms are affected in their work, home, and social lives. Early diagnosis of mental illness is difficult, especially within the Arabic culture, because of the stigma of mental illness and lack of awareness in the field of psychiatry. Meanwhile, social media and its posts provide a new fertile source for mental health surveillance by people express their feelings, moods, and daily activity. Recently, the research field in detecting mental illness through social media has begun to be an exciting topic with the increase in popularity of social media platforms and the current studies in this area just covering English data. To our knowledge, this is the first study that has used Arabic data to explore depressive emotions in an online population. Our experiment, which is based on data collected from Twitter in the Gulf region, detects users who self-declared in their tweets as having been diagnosed with depression. Another set of tweets from non-depressed users was used as a standard group to construct a corpus with truth labels (depressed and non-depressed). We then built a predictive model based on supervised learning algorithms (Random Forest, Na\xEFve Bayes, AdaBoostM1, and Liblinear) to predict whether a user\x92s tweet was depressed or not. Our predictive model leveraged from an efficient features set which was extracted to cover not only the symptoms of clinical depression but also online depression-related behaviour on Twitter (e.g., interaction with trending hashtags and frequent emojis). We observed that optimal accuracy performance was with the Liblinear classifier at 87.5%.

References

Page 1

	Year	Citations

Page 1