Publication | Closed Access
A Combinatorial Approach to the Variable Selection in Multiple Linear Regression: Analysis of Selwood <i>et al.</i> Data Set – A Case Study
60
Citations
18
References
2003
Year
EngineeringFeature SelectionRegression AnalysisVariable SelectionData ScienceStatistical ComputingData Set Cp‐mlrCp‐mlr OneFactor AnalysisBiostatisticsPublic HealthStatisticsLatent Variable MethodsCombinatorial ApproachMultidimensional AnalysisModel ComparisonMarginal Structural ModelsRobust ModelingCase StudyStatistical InferenceCombinatorial ProtocolMultivariate AnalysisData Modeling
Abstract A combinatorial protocol (CP) is introduced here to interface it with the multiple linear regression (MLR) for variable selection. The efficiency of CP‐MLR is primarily based on the restriction of entry of correlated variables to the model development stage. It has been used for the analysis of Selwood et al data set [16], and the obtained models are compared with those reported from GFA [8] and MUSEUM [9] approaches. For this data set CP‐MLR could identify three highly independent models (27, 28 and 31) with Q 2 value in the range of 0.632–0.518. Also, these models are divergent and unique. Even though, the present study does not share any models with GFA [8], and MUSEUM [9] results, there are several descriptors common to all these studies, including the present one. Also a simulation is carried out on the same data set to explain the model formation in CP‐MLR. The results demonstrate that the proposed method should be able to offer solutions to data sets with 50 to 60 descriptors in reasonable time frame. By carefully selecting the inter‐parameter correlation cutoff values in CP‐MLR one can identify divergent models and handle data sets larger than the present one without involving excessive computer time.
| Year | Citations | |
|---|---|---|
Page 1
Page 1