Concepedia

Publication | Open Access

Foreign Words and the Automatic Processing of Arabic Social Media Text Written in Roman Script

46

Citations

21

References

2014

Year

Abstract

Arabic on social media has all the prop-erties of any language on social media that make it tough for natural language processing, plus some specific problems. These include diglossia, the use of an alternative alphabet (Roman), and code switching with foreign languages. In this paper, we present a system which can process Arabic written in Roman alpha-bet (“Arabizi”). It identifies whether each word is a foreign word or one of an-other four categories (Arabic, name, punc-tuation, sound), and transliterates Arabic words and names into the Arabic alphabet. We obtain an overall system performance of 83.8 % on an unseen test set. 1

References

YearCitations

Page 1