Enhancing Supervised Learning with Unlabeled Data

Abstract

In many practical learning scenarios, there is a small amount of labeled data along with a large pool of unlabeled data. Many supervised learning algorithms have been developed and extensively studied. We present a new &quot;co-training&quot; strategy for using unlabeled data to improve the performance of standard supervised learning algorithms. Unlike much of the prior work, such as the co-training procedure of Blum and Mitchell (1998), we do not assume there are two redundant views both of which are sufficient for perfect classification. The only requirement our co-training strategy places on each supervised learning algorithm is that its hypothesis partitions the example space into a set of equivalence classes (e.g. for a decision tree each leaf defines an equivalence class). We evaluate our co-training strategy via experiments using data from the UCI repository. 1. Introduction In many practical learning scenarios, there is a small amount of labeled data along with a lar...