Sure Independence Screening for Ultrahigh Dimensional Feature Space

TLDR

Variable selection is crucial in high‑dimensional statistical modeling, but as dimensionality grows ultrahigh, methods like the Dantzig selector face challenges due to the logarithmic penalty log(p) and potential failure of the uniform uncertainty principle. The authors aim to develop a sure independence screening method based on correlation learning to reduce ultrahigh dimensionality to a moderate scale below the sample size. Correlation learning, proven to possess the sure screening property even for exponentially growing dimensionality, forms the basis of the proposed method, with an iterative variant introduced to improve finite‑sample performance. The method accurately reduces dimensionality below the sample size, thereby enhancing both speed and accuracy of variable selection when combined with techniques such as SCAD, the Dantzig selector, lasso, or adaptive lasso, and clarifies their interrelationships.

Abstract

Summary Variable selection plays an important role in high dimensional statistical modelling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, accuracy of estimation and computational cost are two top concerns. Recently, Candes and Tao have proposed the Dantzig selector using L1-regularization and showed that it achieves the ideal risk up to a logarithmic factor log(p). Their innovative procedure and remarkable result are challenged when the dimensionality is ultrahigh as the factor log(p) can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method that is based on correlation learning, called sure independence screening, to reduce dimensionality from high to a moderate scale that is below the sample size. In a fairly general asymptotic framework, correlation learning is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, iterative sure independence screening is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be accomplished by a well-developed method such as smoothly clipped absolute deviation, the Dantzig selector, lasso or adaptive lasso. The connections between these penalized least squares methods are also elucidated.

References

Page 1

	Year	Citations
Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Yoav Benjamini, Yosef Hochberg Journal of the Royal Statistical Society Series B (Statistical Methodology) EngineeringStatistical FoundationPowerful ApproachDiagnosisData Mining	1995	105.5K
Regression Shrinkage and Selection Via the Lasso Robert Tibshirani Journal of the Royal Statistical Society Series B (Statistical Methodology) EngineeringHigh-dimensional MethodRobust ModelingEstimation StatisticFeature Selection	1996	50.3K
Greedy function approximation: A gradient boosting machine. Jerome H. Friedman The Annals of Statistics EngineeringMachine LearningData MiningPattern RecognitionGreedy Function Approximation	2001	27.3K
Statistical Learning Theory Yuhai Wu, Vladimir Vapnik Technometrics Generalization TheoryEngineeringMachine LearningData ScienceComputational Learning Theory	1999	26.9K
Reducing the Dimensionality of Data with Neural Networks Geoffrey E. Hinton, Ruslan Salakhutdinov Science	2006	20.5K
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting Yoav Freund, Robert E. Schapire Journal of Computer and System Sciences Mathematical ProgrammingDecision-theoretic GeneralizationEngineeringMachine LearningData Science	1997	19.8K
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring Todd R. Golub, Donna K. Slonim, Pablo Tamayo, Science EngineeringMixed-phenotype Acute LeukemiaDna MicroarraysPathologyClass Discovery	1999	11.6K
Statistical significance for genomewide studies John D. Storey, Robert Tibshirani Proceedings of the National Academy of Sciences GeneticsGenomicsHigh Throughput SequencingFalse Positive RateGenome-wide Association Studies	2003	9.9K
Least angle regression Bradley Efron, Trevor Hastie, Iain M. Johnstone, The Annals of Statistics EngineeringMachine LearningFeature SelectionRegression AnalysisLars Modification	2004	9.4K
Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties Jianqing Fan, Runze Li Journal of the American Statistical Association Parameter EstimationEngineeringMachine LearningData ScienceHigh-dimensional Method	2001	9K

Page 1