Publication | Closed Access
Cross-training
50
Citations
21
References
2003
Year
Unknown Venue
Natural Language ProcessingEngineeringInformation RetrievalData ScienceData MiningMachine LearningDocuments DbAutomatic ClassificationSemantic LearningKnowledge DiscoveryDocument ClassificationSemi-supervised LearningSupervised LearningText Mining
Classification is a well-established operation in text mining. Given a set of labels A and a set DA of training documents tagged with these labels, a classifier learns to assign labels to unlabeled test documents. Suppose we also had available a different set of labels B, together with a set of documents DB marked with labels from B. If A and B have some semantic overlap, can the availability of DB help us build a better classifier for A, and vice versa? We answer this question in the affirmative by proposing cross-training: a new approach to semi-supervised learning in presence of multiple label sets. We give distributional and discriminative algorithms for cross-training and show, through extensive experiments, that cross-training can discover and exploit probabilistic relations between two taxonomies for more accurate classification.
| Year | Citations | |
|---|---|---|
1977 | 49.2K | |
1997 | 6.1K | |
1999 | 5.8K | |
1998 | 5.6K | |
2006 | 4.3K | |
1993 | 3.5K | |
1998 | 3.2K | |
2000 | 2.7K | |
1998 | 2.7K | |
1999 | 2.7K |
Page 1
Page 1