Publication | Open Access
Investigating the use of morphological decomposition and diacritization for improving Arabic LVCSR
41
Citations
19
References
2009
Year
Unknown Venue
EngineeringArabic Morphological AnalysisSpoken Language ProcessingMorphology (Linguistics)Arabic LanguageCorpus LinguisticsSpeech RecognitionNatural Language ProcessingArabicHigh Language ModelComputational LinguisticsPhoneticsRobust Speech RecognitionGrammarVoice RecognitionLanguage StudiesMachine TranslationMorphological DecompositionMorphologyArabic LvcsrComputer ScienceMorphological AnalysisLanguage RecognitionSpeech ProcessingSpeech InputLinguistics
One of the challenges related to large vocabulary Arabic speech recognition is the rich morphology nature of Arabic language which leads to both high out-of-vocabulary (OOV) rates and high language model (LM) perplexities.Another challenge is the absence of the short vowels (diacritics) from the Arabic written transcripts which causes a large difference between spoken and written language and thus a weaker connection between the acoustic and language models.In this work, we try to address these two important challenges by introducing both morphological decomposition and diacritization in Arabic language modeling.Finally, we are able to obtain about 3.7% relative reduction in word error rate (WER) with respect to a comparable non-diacritized full-words system running on our test set.
| Year | Citations | |
|---|---|---|
Page 1
Page 1