Arabic tweets sentiment analysis – a hybrid scheme

TLDR

Twitter’s 140‑character limit makes it a dominant social platform, and its popularity in Saudi Arabia provides a rich source of public sentiment amid regional tensions, though dialectical Arabic complicates analysis. The study proposes a hybrid sentiment analysis scheme that blends semantic orientation with machine‑learning techniques to tackle the challenges of Arabic tweets. The approach first uses a lexical‑based classifier to automatically label training data, then trains an SVM classifier on that output. Experiments show the hybrid method raises the lexical classifier’s F‑measure by 5.76 % and accuracy by 16.41 %, achieving overall F‑measure and accuracy of 84 % and 84.01 %.

Abstract

The fact that people freely express their opinions and ideas in no more than 140 characters makes Twitter one of the most prevalent social networking websites in the world. Being popular in Saudi Arabia, we believe that tweets are a good source to capture the public’s sentiment, especially since the country is in a fractious region. Going over the challenges and the difficulties that the Arabic tweets present – using Saudi Arabia as a basis – we propose our solution. A typical problem is the practice of tweeting in dialectical Arabic. Based on our observation we recommend a hybrid approach that combines semantic orientation and machine learning techniques. Through this approach, the lexical-based classifier will label the training data, a time-consuming task often prepared manually. The output of the lexical classifier will be used as training data for the SVM machine learning classifier. The experiments show that our hybrid approach improved the F-measure of the lexical classifier by 5.76% while the accuracy jumped by 16.41%, achieving an overall F-measure and accuracy of 84 and 84.01% respectively.

References

Page 1

	Year	Citations

Page 1