Publication | Open Access
Using LDA to detect semantically incoherent documents
50
Citations
23
References
2008
Year
Unknown Venue
False DocumentEngineeringSemantic WebSemanticsCorpus LinguisticsText MiningNatural Language ProcessingInformation RetrievalData ScienceText SegmentationComputational LinguisticsDocument ClassificationLanguage StudiesContent AnalysisDocument ClusteringKnowledge DiscoveryIncoherent DocumentsComputer ScienceInformation ExtractionTopic ModelKeyword ExtractionTopic DetectionSemantic CoherenceStructured DocumentLinguistics
Detecting the semantic coherence of a document is a challenging task and has several applications such as in text segmentation and categorization. This paper is an attempt to distinguish between a 'semantically coherent' true document and a 'randomly generated' false document through topic detection in the framework of latent Dirichlet analysis. Based on the premise that a true document contains only a few topics and a false document is made up of many topics, it is asserted that the entropy of the topic distribution will be lower for a true document than that for a false document. This hypothesis is tested on several false document sets generated by various methods and is found to be useful for fake content detection applications.
| Year | Citations | |
|---|---|---|
Page 1
Page 1