Publication | Open Access
Polylingual topic models
335
Citations
16
References
2009
Year
Unknown Venue
EngineeringCross-lingual RepresentationPolylingual Topic ModelsCorpus LinguisticsText MiningWord EmbeddingsApplied LinguisticsNatural Language ProcessingPolylingual Topic ModelInformation RetrievalData ScienceComputational LinguisticsLanguage StudiesMachine TranslationDocument ClusteringKnowledge DiscoveryTerminology ExtractionCross-language RetrievalTopic ModelTopic ModelsLinguisticsTopic Trends
Topic models are a useful tool for analyzing large text collections, but have previously been applied in only monolingual, or at most bilingual, contexts. Meanwhile, massive collections of interlinked documents in dozens of languages, such as Wikipedia, are now widely available, calling for tools that can characterize content in many languages. We introduce a polylingual topic model that discovers topics aligned across multiple languages. We explore the model's characteristics using two large corpora, each with over ten different languages, and demonstrate its usefulness in supporting machine translation and tracking topic trends across languages.
| Year | Citations | |
|---|---|---|
Page 1
Page 1