Concepedia

Publication | Closed Access

Robust Estimates of Location and Dispersion for High-Dimensional Datasets

403

Citations

32

References

2002

Year

Abstract

AbstractThe computing times of high-breakdown point estimates of multivariate location and scatter increase rapidly with the number of variables, which makes them impractical for high-dimensional datasets, such as those used in data mining. We propose an estimator of location and scatter based on a modified version of the Gnanadesikan–Kettenring robust covariance estimate. We compare its behavior with that of the Stahel–Donoho (SD) and Rousseeuw and Van Driessen's fast MCD (FMCD) estimates. In simulations with contaminated multivariate normal data, our estimate is almost as good as SD and clearly better than FMCD. It is much faster than both, especially for large dimension. We give examples with real data with dimensions between 5 and 93, in which the proposed estimate is as good as or better than SD and FMCD at detecting outliers and other structures, with much shorter computing times.KEY WORDS : Data miningMinimum covariance determinantRobust covariancesStahel–Donoho estimate

References

YearCitations

Page 1