Publication | Closed Access
Partial least squares discriminant analysis: taking the magic away
761
Citations
14
References
2014
Year
EngineeringBiometricsDiagnosisPartial Least SquaresClassification MethodData ScienceData MiningPattern RecognitionQuantitative AnalysisBiomedical Data ScienceStatistical ComputingMultilinear Subspace LearningBiostatisticsPls Scores PlotsPublic HealthPrincipal Component AnalysisStatisticsLatent Variable MethodsDimensionality ReductionMetabolomicsData ClassificationClassificationCommon PitfallsBiomedical Data Analysis
Partial least squares discriminant analysis has been available for nearly two decades yet remains poorly understood by most users. For two equal‑sized classes, a single PLS component yields the same classification as Euclidean distance to centroids, while all nonzero components recover linear discriminant analysis; the paper also discusses extensions to unequal class sizes, multiple classes, common pitfalls, overfitting, and score plots. The authors conclude that PLS‑DA offers no clear advantage over traditional methods and should be viewed as a step within a broader classification workflow, though its weights and loadings provide valuable exploratory insight, especially in metabolomics. © 2014 John Wiley & Sons, Ltd.
Partial least squares discriminant analysis (PLS‐DA) has been available for nearly 20 years yet is poorly understood by most users. By simple examples, it is shown graphically and algebraically that for two equal class sizes, PLS‐DA using one partial least squares (PLS) component provides equivalent classification results to Euclidean distance to centroids, and by using all nonzero components to linear discriminant analysis. Extensions where there are unequal class sizes and more than two classes are discussed including common pitfalls and dilemmas. Finally, the problems of overfitting and PLS scores plots are discussed. It is concluded that for classification purposes, PLS‐DA has no significant advantages over traditional procedures and is an algorithm full of dangers. It should not be viewed as a single integrated method but as step in a full classification procedure. However, despite these limitations, PLS‐DA can provide good insight into the causes of discrimination via weights and loadings, which gives it a unique role in exploratory data analysis, for example in metabolomics via visualisation of significant variables such as metabolites or spectroscopic peaks. Copyright © 2014 John Wiley & Sons, Ltd.
| Year | Citations | |
|---|---|---|
Page 1
Page 1