Concepedia

Publication | Open Access

Probabilistic latent semantic indexing

3.9K

Citations

6

References

1999

Year

Thomas Hofmann

Unknown Venue

Abstract

Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized model is able to deal with domain speci c synonymy as well as with polysemous words. In contrast to standard Latent Semantic Indexing LSI by Singular Value Decomposition, the probabilistic variant has a solid statistical foundation and de nes a proper generative data model. Retrieval experiments on a number of test collections indicate substantial performance gains over direct term matching metho d s a s w ell as over LSI. In particular, the combination of models with di erent dimensionalities has proven to be advantageous.

References

YearCitations

Page 1