Concepedia

Publication | Closed Access

A Fast Algorithm for the Minimum Covariance Determinant Estimator

1.9K

Citations

18

References

1999

Year

TLDR

The minimum covariance determinant (MCD) method is a highly robust estimator of multivariate location and scatter, but its application has been limited by the computational cost of existing algorithms, which can handle only a few hundred observations in a few dimensions. This study develops FAST‑MCD, an algorithm designed to efficiently compute the MCD for large datasets, and introduces the distance‑distance plot to visualize robust distances against Mahalanobis distances. FAST‑MCD employs an inequality involving order statistics and determinants, along with selective iteration and nested extensions, achieving exact MCD for small samples and substantially faster, more accurate results for large samples, as demonstrated on Philips production data (677 × 9) and an astronomical dataset (137,256 × 27). The algorithm detects exact fits—hyperplanes containing h or more observations—and renders the MCD method a practical routine tool for multivariate data analysis.

Abstract

The minimum covariance determinant (MCD) method of Rousseeuw is a highly robust estimator of multivariate location and scatter. Its objective is to find h observations (out of n) whose covariance matrix has the lowest determinant. Until now, applications of the MCD were hampered by the computation time of existing algorithms, which were limited to a few hundred objects in a few dimensions. We discuss two important applications of larger size, one about a production process at Philips with n = 677 objects and p = 9 variables, and a dataset from astronomy with n = 137,256 objects and p = 27 variables. To deal with such problems we have developed a new algorithm for the MCD, called FAST-MCD. The basic ideas are an inequality involving order statistics and determinants, and techniques which we call “selective iteration” and “nested extensions.” For small datasets, FAST-MCD typically finds the exact MCD, whereas for larger datasets it gives more accurate results than existing algorithms and is faster by orders of magnitude. Moreover, FASTMCD is able to detect an exact fit—that is, a hyperplane containing h or more observations. The new algorithm makes the MCD method available as a routine tool for analyzing multivariate data. We also propose the distance-distance plot (D-D plot), which displays MCD-based robust distances versus Mahalanobis distances, and illustrate it with some examples.

References

YearCitations

Page 1