Supervised learning from incomplete data via an EM approach

TLDR

Real‑world learning tasks often involve high‑dimensional data sets with arbitrary patterns of missing data. This paper proposes a maximum‑likelihood density‑estimation framework that uses mixture models and EM to learn from such incomplete data. The algorithm employs EM both to estimate mixture components and to handle missing values, and it applies to a broad range of supervised and unsupervised learning problems. Classification experiments on the iris data set demonstrate the method’s performance.

Abstract

Real-world learning tasks may involve high-dimensional data sets with arbitrary patterns of missing data. In this paper we present a framework based on maximum likelihood density estimation for learning from such data set.s. We use mixture models for the density estimates and make two distinct appeals to the Expectation-Maximization (EM) principle (Dempster et al., 1977) in deriving a learning algorithm--EM is used both for the estimation of mixture components and for coping with missing data. The resulting algorithm is applicable to a wide range of supervised as well as unsupervised learning problems. Results from a classification benchmark--the iris data set--are presented.

References

Page 1

	Year	Citations

Page 1