Publication | Open Access
Necessary and Sufficient Conditions for Novel Word Detection in Separable Topic Models
10
Citations
8
References
2013
Year
Intelligent Information ProcessingEngineeringSeparable Topic ModelsRandom ProjectionsCorpus LinguisticsText MiningSimplicial ConditionWord EmbeddingsNatural Language ProcessingInformation RetrievalData ScienceText SegmentationNovel Word DetectionComputational LinguisticsLanguage StudiesContent AnalysisStatisticsTopic EstimationDocument ClusteringKnowledge DiscoveryComputer ScienceAlgorithmic Information TheoryDistributional SemanticsTopic ModelSufficient ConditionsKeyword ExtractionLinguistics
The simplicial condition and other stronger conditions that imply it have recently played a central role in developing polynomial time algorithms with provable asymptotic consistency and sample complexity guarantees for topic estimation in separable topic models. Of these algorithms, those that rely solely on the simplicial condition are impractical while the practical ones need stronger conditions. In this paper, we demonstrate, for the first time, that the simplicial condition is a fundamental, algorithm-independent, information-theoretic necessary condition for consistent separable topic estimation. Furthermore, under solely the simplicial condition, we present a practical quadratic-complexity algorithm based on random projections which consistently detects all novel words of all topics using only up to second-order empirical word moments. This algorithm is amenable to distributed implementation making it attractive for 'big-data' scenarios involving a network of large distributed databases.
| Year | Citations | |
|---|---|---|
Page 1
Page 1