Concepedia

TLDR

The study compares the variable importance in projection (VIP) and selectivity ratio (SR) methods for variable selection in partial least squares regression. The authors applied both methods to three datasets—physicochemical water quality parameters linked to sensorial data, GC‑MS chemical profiles from fossil sea sediments related to sea surface temperature, and Daphnia magna gene expression linked to offspring production—using correlation coefficients, significance levels, and interpretation of experimental phenomena to evaluate each method. In the water quality dataset, SR outperformed VIP for sensorial prediction; for the climate dataset, VIP yielded more interpretable variables from raw GC‑MS chromatograms, whereas SR was more predictive when using selected peak areas, and VIP identified key variables for SST changes; in the transcriptomic dataset, SR proved more reliable for prediction. © 2015 John Wiley & Sons, Ltd.

Abstract

This study compares the application of two variable selection methods in partial least squares regression (PLSR), the variable importance in projection (VIP) method and the selectivity ratio (SR) method. For this purpose, three different data sets were analysed: (a) physiochemical water quality parameters related to sensorial data, (b) gas chromatography–mass spectrometry (GC‐MS) chemical (organic compound) profiles from fossil sea sediment samples related to sea surface temperature (SST) changes, and (c) exposed genes of Daphnia magna female samples related to their total offspring production. Correlation coefficients ( r ), levels of significance ( p ‐value) and interpretation of the underlying experimental phenomena allowed the discussion about the best approach for variable selection in each case. The comparison of the two variable selection methods in the first water quality data set showed that the SR method is more accurate for sensorial prediction. For the climate data set, when raw total ion current (TIC) GC‐MS chromatograms were considered, variables selected using the VIP method were easier to interpret compared with those selected by the SR method. However, when only some chromatographic peak areas (concentrations) were considered, the SR method was more efficient for prediction, and the VIP method selected the most relevant variables for the interpretation of SST changes. Finally, for the transcriptomic data set, the SR method was found again to be more reliable for prediction purposes. Copyright © 2015 John Wiley & Sons, Ltd.

References

YearCitations

Page 1