Publication | Open Access
Language independent authorship attribution using character level language models
155
Citations
9
References
2003
Year
Unknown Venue
EngineeringCross-lingual RepresentationComputer-assisted Authorship AttributionWriter IdentificationLarge Language ModelCorpus LinguisticsChinese DataText MiningApplied LinguisticsNatural Language ProcessingLanguage DocumentationInformation RetrievalData ScienceComputational LinguisticsLanguage EngineeringLinguistic TypologyLanguage StudiesMachine TranslationAuthor ProfilingLinguisticsLanguage Independence
We present a method for computer-assisted authorship attribution based on character-level n-gram language models. Our approach is based on simple information theoretic principles, and achieves improved performance across a variety of languages without requiring extensive pre-processing or feature selection. To demonstrate the effectiveness and language independence of our approach, we present experimental results on Greek, English, and Chinese data. We show that our approach achieves state of the art performance in each of these cases. In particular, we obtain a 18% accuracy improvement over the best published results for a Greek data set, while using a far simpler technique than previous investigations.
| Year | Citations | |
|---|---|---|
Page 1
Page 1