Publication | Closed Access
Ridge Estimators in Logistic Regression
1.7K
Citations
9
References
1992
Year
Parameter EstimationUnknown Ridge ParameterEngineeringPrognosisGynecologyCancer RegistrationOvarian CancerRidge EstimatorsBiostatisticsEstimation TheoryMolecular DiagnosticsStatisticsCancer ResearchMedicineCervical CancerLogistic RegressionStatistical InferenceRidge RegressionOncology
In this example, the number of covariates is large relative to the number of observations, leading to overfitting when parameters are unconstrained. The paper demonstrates how ridge estimators can be used in logistic regression to improve parameter estimates and reduce prediction error. The authors explore ridge parameter selection via cross‑validation, evaluate three error metrics (classification error, squared error, minus log‑likelihood), and illustrate the approach by building a prognostic index for two‑year ovarian cancer survival based on DNA histogram features. Constraining neighboring DNA histogram intervals to have similar influence on survival produces clinically interpretable ridge estimates, and the resulting model predicts new observations more accurately.
SUMMARY In this paper it is shown how ridge estimators can be used in logistic regression to improve the parameter estimates and to diminish the error made by further predictions. Different ways to choose the unknown ridge parameter are discussed. The main attention focuses on ridge parameters obtained by cross-validation. Three different ways to define the prediction error are considered: classification error, squared error and minus log-likelihood. The use of ridge regression is illustrated by developing a prognostic index for the two-year survival probability of patients with ovarian cancer as a function of their deoxyribonucleic acid (DNA) histogram. In this example, the number of covariates is large compared with the number of observations and modelling without restrictions on the parameters leads to overfitting. Defining a restriction on the parameters, such that neighbouring intervals in the DNA histogram differ only slightly in their influence on the survival, yields ridge-type parameter estimates with reasonable values which can be clinically interpreted. Furthermore the model can predict new observations more accurately.
| Year | Citations | |
|---|---|---|
Page 1
Page 1