Publication | Open Access
Probabilistic latent semantic indexing
3.9K
Citations
6
References
1999
Year
Unknown Venue
Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized model is able to deal with domain speci c synonymy as well as with polysemous words. In contrast to standard Latent Semantic Indexing LSI by Singular Value Decomposition, the probabilistic variant has a solid statistical foundation and de nes a proper generative data model. Retrieval experiments on a number of test collections indicate substantial performance gains over direct term matching metho d s a s w ell as over LSI. In particular, the combination of models with di erent dimensionalities has proven to be advantageous.
| Year | Citations | |
|---|---|---|
Page 1
Page 1