Publication | Closed Access
Automatically learning document taxonomies for hierarchical classification
53
Citations
3
References
2005
Year
Unknown Venue
EngineeringSuitable Hierarchical StructureSemantic WebCorpus LinguisticsText MiningBinary SvmsNatural Language ProcessingClassification MethodInformation RetrievalData ScienceData MiningComputational LinguisticsDocument ClassificationLanguage StudiesHierarchical ClassificationDocument TaxonomiesAutomatic ClassificationKnowledge DiscoveryIntelligent Classification20-Newsgroup DatasetClassificationLinguistics
While several hierarchical classification methods have been applied to web content, such techniques invariably rely on a pre-defined taxonomy of documents.We propose a new technique that extracts a suitable hierarchical structure automatically from a corpus of labeled documents.We show that our technique groups similar classes closer together in the tree and discovers relationships among documents that are not encoded in the class labels.The learned taxonomy is then used along with binary SVMs for multi-class classification.We demonstrate the efficacy of our approach by testing it on the 20-Newsgroup dataset.
| Year | Citations | |
|---|---|---|
Page 1
Page 1