Publication | Open Access
Language and task independent text categorization with simple language models
112
Citations
18
References
2003
Year
Unknown Venue
EngineeringFeature SelectionLanguage LearningCorpus LinguisticsText MiningNatural Language ProcessingApplied LinguisticsInformation RetrievalData ScienceSimple Language ModelsComputational LinguisticsLanguage EngineeringDocument ClassificationTask IndependenceLanguage StudiesContent AnalysisMachine TranslationAutomatic ClassificationNlp TaskLanguage TechnologyKnowledge DiscoveryAuthor ProfilingText Genre ClassificationIntelligent ClassificationLinguistics
We present a simple method for language independent and task independent text categorization learning, based on character-level n-gram language models. Our approach uses simple information theoretic principles and achieves effective performance across a variety of languages and tasks without requiring feature selection or extensive pre-processing. To demonstrate the language and task independence of the proposed technique, we present experimental results on several languages---Greek, English, Chinese and Japanese---in several text categorization problems---language identification, authorship attribution, text genre classification, and topic detection. Our experimental results show that the simple approach achieves state of the art performance in each case.
| Year | Citations | |
|---|---|---|
Page 1
Page 1