A Novel approach to improve rule based Telugu morphological analyzer

Abstract

Telugu is an Indian language spoken by more than 50 million people in the country. Language is very rich in literature, and it requires advancements in computational approaches. Applications like machine translation, speech recognition, speech synthesis and information retrieval need a powerful morphological generator to give morphological forms of nouns and verbs. The existing Telugu morphological analyzer (TMA) is rule based. The performance of it is further improved by our novel approach which provides a system that gives information about possible decompositions of the word inflected by many morphemes. Using these possible decompositions the root word could be extracted for those words which were unrecognized by rule based morphological analyzer. The experiment is conducted on Telugu text corpus from CIIL Mysore and the improvement in the performance is checked by the rule based morphological analyzer developed by LTRC group, IIIT and HCU,Hyderabad. In this present work we present an unsupervised stemmer for improving the performance of Telugu rule based morph analyzer. The observed increase in performance of rule based is from 77% to 84.2% for words which are in hundreds. It can still be improved if the corpus is increased.

References

Page 1

	Year	Citations

Page 1