Publication | Closed Access
Robust Estimates of Location and Dispersion for High-Dimensional Datasets
403
Citations
32
References
2002
Year
EngineeringSpatial UncertaintyMining MethodsLocalizationData ScienceData MiningRobust StatisticMultivariate LocationStatisticsKnowledge DiscoveryMultidimensional AnalysisDimensionality ReductionFast McdRobust EstimatesHigh-dimensional MethodRobust ModelingBusinessMultivariate AnalysisSpatial Statistics
AbstractThe computing times of high-breakdown point estimates of multivariate location and scatter increase rapidly with the number of variables, which makes them impractical for high-dimensional datasets, such as those used in data mining. We propose an estimator of location and scatter based on a modified version of the Gnanadesikan–Kettenring robust covariance estimate. We compare its behavior with that of the Stahel–Donoho (SD) and Rousseeuw and Van Driessen's fast MCD (FMCD) estimates. In simulations with contaminated multivariate normal data, our estimate is almost as good as SD and clearly better than FMCD. It is much faster than both, especially for large dimension. We give examples with real data with dimensions between 5 and 93, in which the proposed estimate is as good as or better than SD and FMCD at detecting outliers and other structures, with much shorter computing times.KEY WORDS : Data miningMinimum covariance determinantRobust covariancesStahel–Donoho estimate
| Year | Citations | |
|---|---|---|
Page 1
Page 1