Publication | Open Access
On Consistency and Sparsity for Principal Components Analysis in High Dimensions
867
Citations
36
References
2009
Year
Principal Component VectorEngineeringComputational AnalysisPrincipal Components AnalysisHigh DimensionsStandard PcaData SciencePattern RecognitionMultilinear Subspace LearningPublic HealthPrincipal Component AnalysisStatisticsLatent Variable MethodsInverse ProblemsDimensionality ReductionNonlinear Dimensionality ReductionFunctional Data AnalysisSparse RepresentationHigh-dimensional MethodStatistical Inference
Principal components analysis (PCA) is a classic method for the reduction of dimensionality of data in the form of n observations (or cases) of a vector with p variables. Contemporary datasets often have p comparable with or even much larger than n. Our main assertions, in such settings, are (a) that some initial reduction in dimensionality is desirable before applying any PCA-type search for principal modes, and (b) the initial reduction in dimensionality is best achieved by working in a basis in which the signals have a sparse representation. We describe a simple asymptotic model in which the estimate of the leading principal component vector via standard PCA is consistent if and only if p(n)/n→0. We provide a simple algorithm for selecting a subset of coordinates with largest sample variances, and show that if PCA is done on the selected subset, then consistency is recovered, even if p(n) ⪢ n.
| Year | Citations | |
|---|---|---|
Page 1
Page 1