Arabic sentiment analysis: Lexicon-based and corpus-based

TLDR

Web 2.0 generates massive raw data, making extraction of useful information challenging; sentiment analysis extracts user opinions, and has been studied in English using corpus‑based and lexicon‑based methods. The study applies both corpus‑based and lexicon‑based sentiment analysis to Arabic. The authors built a manually annotated Arabic dataset, constructed a lexicon, and performed experiments across stages to improve system accuracy and compare it to the corpus‑based approach. The experiments showed that the lexicon‑based system achieved higher accuracy than the corpus‑based approach.

Abstract

The emergence of the Web 2.0 technology generated a massive amount of raw data by enabling Internet users to post their opinions, reviews, comments on the web. Processing this raw data to extract useful information can be a very challenging task. An example of important information that can be automatically extracted from the users' posts and comments is their opinions on different issues, events, services, products, etc. This problem of Sentiment Analysis (SA) has been studied well on the English language and two main approaches have been devised: corpus-based and lexicon-based. This paper addresses both approaches to SA for the Arabic language. Since there is a limited number of publically available Arabic dataset and Arabic lexicons for SA, this paper starts by building a manually annotated dataset and then takes the reader through the detailed steps of building the lexicon. Experiments are conducted throughout the different stages of this process to observe the improvements gained on the accuracy of the system and compare them to corpus-based approach.

References

Page 1

	Year	Citations

Page 1