Concepedia

Abstract

This paper introduces an Arabic text annotation tool called Fassieh <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">reg</sup> . Via a sophisticated interactive GUI application, Fassieh <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">reg</sup> makes it easy to build structured large standard written Arabic corpora, then allows the production of fundamental linguistic analyses; i.e., language factorizations, at high coverage and accuracy rates over such corpora. Arabic morphological analysis, part-of-speech (PoS)-tagging, full phonetic transcription (diacritization), and lexical semantics analysis are the most significant Arabic language factorizations currently supported by Fassieh <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">reg</sup> . The high inherent ambiguity of these analyses is statistically resolved in Fassieh <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">reg</sup> which also affords a multitude of auxiliary features enabling a guided, normalized, and efficient proofreading of any part of the factorized corpus. The paper first reviews the highly inflective and derivative nature of Arabic language, our Arabic language factorization models, and the associated statistical disambiguation methodology. Afterwards, we present Fassieh <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">reg</sup> which is not only a text annotation tool, but is also an evaluation, demonstrative, and tutorial means of Arabic natural language processing (NLP).