Cross-training - Concepedia

Abstract

Classification is a well-established operation in text mining. Given a set of labels A and a set DA of training documents tagged with these labels, a classifier learns to assign labels to unlabeled test documents. Suppose we also had available a different set of labels B, together with a set of documents DB marked with labels from B. If A and B have some semantic overlap, can the availability of DB help us build a better classifier for A, and vice versa? We answer this question in the affirmative by proposing cross-training: a new approach to semi-supervised learning in presence of multiple label sets. We give distributional and discriminative algorithms for cross-training and show, through extensive experiments, that cross-training can discover and exploit probabilistic relations between two taxonomies for more accurate classification.

References

Page 1

	Year	Citations
Maximum Likelihood from Incomplete Data Via the <i>EM</i> Algorithm A. P. Dempster, N. M. Laird, Donald B. Rubin Journal of the Royal Statistical Society Series B (Statistical Methodology) Statistical Signal ProcessingMixture DistributionEngineeringData ScienceIncompleteness	1977	49.2K
Multitask Learning Rich Caruana Machine Learning	1997	6.1K
Advances in kernel methods: support vector learning Bernhard Schölkopf, Christopher J. C. Burges, Alexander J. Smola International Conference on Neural Information Processing Support VectorEngineeringMachine LearningSupport Vector LearningSupport Vector Machine	1999	5.8K
Combining labeled and unlabeled data with co-training Avrim Blum, Tom M. Mitchell	1998	5.6K
Making Large-Scale SVM Learning Practical Thorsten Joachims Technical reports Artificial IntelligenceMathematical ProgrammingSupport Vector MachineImage AnalysisMachine Learning	2006	4.3K
Journal of the Royal Statistical Society (B) Robert Fildes International Journal of Forecasting Statistical ReviewStatistical FoundationStatistical InferenceRoyal Statistical SocietyStatistical Science	1993	3.5K
A comparison of event models for naive bayes text classification Andrew McCallum, Kamal Nigam	1998	3.2K
Text Classification from Labeled and Unlabeled Documents using EM Kamal Nigam, Andrew Kachites McCallum, Sebastian Thrun, Machine Learning	2000	2.7K
Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines John Platt	1998	2.7K
A re-examination of text categorization methods Yiming Yang, Xin Liu	1999	2.7K

Page 1