Publication | Closed Access
A novel word clustering algorithm based on latent semantic analysis
101
Citations
8
References
2002
Year
Unknown Venue
EngineeringSuitable Vector SpaceCorpus LinguisticsText MiningNatural Language ProcessingInformation RetrievalData ScienceData MiningComputational LinguisticsNovel WordLanguage StudiesDocument ClusteringDistance MeasureClustering (Nuclear Physics)Knowledge DiscoveryTerminology ExtractionDistributional SemanticsLatent Semantic AnalysisVector Space ModelClustering (Data Mining)LinguisticsSemantic Similarity
A new approach is proposed for the clustering of words in a given vocabulary. The method is based on a paradigm first formulated in the context of information retrieval, called latent semantic analysis. This paradigm leads to a parsimonious vector representation of each word in a suitable vector space, where familiar clustering techniques can be applied. The distance measure selected in this space arises naturally from the problem formulation. Preliminary experiments indicate that, the clusters produced are intuitively satisfactory. Because these clusters are semantic in nature, this approach may prove useful as a complement to conventional class-based statistical language modeling techniques.
| Year | Citations | |
|---|---|---|
Page 1
Page 1