Publication | Open Access
BERT-Based Sentiment Analysis for Low-Resourced Languages: A Case Study of Urdu Language
22
Citations
45
References
2023
Year
EngineeringCross-lingual RepresentationDeep Learning ClassifierMultimodal Sentiment AnalysisSentiment AnalysisLanguage ProcessingText MiningLow-resource Language ProcessingWord EmbeddingsNatural Language ProcessingApplied LinguisticsArabicComputational LinguisticsCorpus AnalysisLanguage StudiesMachine TranslationNlp TaskLanguage TechnologyUrdu DatasetBert-based Sentiment AnalysisUrdu LanguageCase StudyLanguage CorpusLinguistics
Sentiment analysis holds significant importance in research projects by providing valuable insights into public opinions. However, the majority of sentiment analysis studies focus on the English language, leaving a gap in research for other low-resourced languages or regional languages, e.g., Persian, Pashto, and Urdu. Moreover, computational linguists face the challenge of developing lexical resources for these languages. In light of this, this paper presents a deep learning-based approach for Urdu Text Sentiment Analysis (USA-BERT), leveraging Bidirectional Encoder Representations from Transformers and introduces an <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Urdu Dataset for Sentiment Analysis-23</i> (UDSA-23). USA-BERT first preprocesses the Urdu reviews by exploiting <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">BERT-Tokenizer</i> . Second, it creates BERT embeddings for each Urdu review. Third, given the BERT embeddings, it fine-tunes a deep learning classifier (BERT). Finally, it employs the Pareto principle on two datasets (the state-of-the-art (UCSA-21) and UDSA-23) to assess USA-BERT. The assessment results demonstrate that USA-BERT significantly surpasses the existing methods by improving the accuracy and f-measure up to 26.09% and 25.87%, respectively.
| Year | Citations | |
|---|---|---|
Page 1
Page 1