Publication | Closed Access
A Comparison of Imputation Techniques for Handling Missing Data
297
Citations
18
References
2002
Year
Researchers frequently encounter missing data in studies. The article aims to guide the selection and application of missing‑data methods for a single variable. The study simulated MAR missingness in 492 cases and compared five imputation approaches—listwise deletion, mean substitution, simple regression, regression with an error term, and EM algorithm—on descriptive statistics and correlations for the imputed subset and the full sample. Mean substitution performed worst, while regression with error term and EM algorithm produced estimates most similar to the original data, though all methods had limitations.
Researchers are commonly faced with the problem of missing data. This article presents theoretical and empirical information for the selection and application of approaches for handling missing data on a single variable. An actual data set of 492 cases with no missing values was used to create a simulated yet realistic data set with missing at random (MAR) data. The authors compare and contrast five approaches (listwise deletion, mean substitution, simple regression, regression with an error term, and the expectation maximization [EM] algorithm) for dealing with missing data, and compare the effects of each method on descriptive statistics and correlation coefficients for the imputed data (n = 96) and the entire sample (n = 492) when imputed data are included. All methods had limitations, although our findings suggest that mean substitution was the least effective and that regression with an error term and the EM algorithm produced estimates closest to those of the original variables.
| Year | Citations | |
|---|---|---|
Page 1
Page 1