Publication | Closed Access
A multivariate technique for multiply imputing missing values using a sequence of regression models
2K
Citations
21
References
2001
Year
Regression ModelsMultivariate TechniqueEngineeringData ScienceImputation ProcessEstimation StatisticPredictive AnalyticsBusinessEconometricsBiostatisticsStatistical InferenceRegression AnalysisComplex Data StructureMultivariate AnalysisStatisticsMarginal Structural ModelsLatent Variable Methods
The method was motivated by analyses of two illustrative datasets. The article describes and evaluates a procedure for imputing missing values in complex data structures when data are missing at random. Imputations are obtained by fitting a sequence of regression models (linear, logistic, Poisson, generalized logit, or mixtures) and drawing from predictive distributions, with optional subpopulation restrictions and logical bounds applied via truncated distributions.
This article describes and evaluates a procedure for imputing missing values for a relatively complex data structure when the data are missing at random. The imputations are obtained by fitting a sequence of regression models and drawing values from the corresponding predictive distributions. The types of regression models used are linear, logistic, Poisson, generalized logit or a mixture of these depending on the type of variable being imputed. Two additional common features in the imputation process are incorporated: restriction to a relevant subpopulation for some variables and logical bounds or constraints for the imputed values. The restrictions involve subsetting the sample individuals that satisfy certain criteria while fitting the regression models. The bounds involve drawing values from a truncated predictive distribution. The development of this method was partly motivated by the analysis of two data sets which are used as illustrations. The sequential regression procedure is applied to perform multiple imputation analysis for the two applied problems. The sampling properties of inferences from multiply imputed data sets created using the sequential regression method are evaluated through simulated data sets.
| Year | Citations | |
|---|---|---|
Page 1
Page 1