Publication | Closed Access
Stopping Rules in Principal Components Analysis: A Comparison of Heuristical and Statistical Approaches
2.3K
Citations
26
References
1993
Year
EngineeringEducationPrincipal Components AnalysisChange AnalysisParallel AnalysisData ScienceFactor AnalysisBiostatisticsPrincipal Component AnalysisStatisticsTotal VarianceOceanic SystemsLatent Variable MethodsKnowledge DiscoveryMultidimensional AnalysisDimensionality ReductionBootstrapped λFunctional Data AnalysisStatistical ApproachesStatistical InferenceStructural ModelingHeuristic ProceduresMultivariate Analysis
Bartlett’s homogeneity test and Lawley’s test are limited to one and two dimensions, respectively. The study compares methods for selecting the number of components in principal component analysis. The authors evaluated heuristic rules (eigenvalue > 1, scree plot, broken‑stick, fixed‑variance) and statistical tests (Bartlett’s sphericity, homogeneity, Lawley’s, bootstrapped λ and eigenvector confidence limits) on simulated and ecological data sets. Broken‑stick and a combined bootstrapped λ/eigenvector approach yielded the most reliable component counts, whereas Kaiser‑based rules, fixed‑variance models, and the scree plot tended to overestimate dimensions, and Bartlett’s sphericity test was inconsistent.
Approaches to determining the number of components to interpret from principal components analysis were compared. Heuristic procedures included: retaining components with eigenvalues (λ) > 1 (i.e., Kaiser—Guttman criterion); components with bootstrapped λ > 1 (bootstrapped Kaiser—Guttman); the scree plot; the broken—stick model; and components with λ totalling to a fixed amount of the total variance. Statistical approaches included: Bartlett's test of sphericity; Bartlett's test of homogeneity of the correlation matrix, Lawley's test of the second λ bootstrapped confidence limits on successive λ (i.e., significant differences between λ); and bootstrapped confidence limits on eigenvector coefficients (i.e., coefficients that differ significantly from zero). All methods were compared using simulated data matrices of uniform correlation structure, patterned matrices of varying correlation structure and data sets of lake morphometry, water chemistry, and benthic invertebrate abundance. The most consistent results were obtained from the broken—stick model and a combined measure using bootstrapped λ and associated eigenvector coefficients. The traditional and bootstrapped Kaiser—Guttman approaches over—estimated the number of nontrivial dimensions as did the fixed—amount—of—variance model. The scree plot consistently estimated one dimension more than the number of simulated dimensions. Barlett's test of sphericity showed inconsistent results. Both Bartlett's test of homogeneity of the correlation matrix and Lawley's test are limited to testing for only one and two dimensions, respectively.
| Year | Citations | |
|---|---|---|
Page 1
Page 1