Missing Data and Multiple Imputation Decision Tree

Abstract

Adequately addressing missing data is a pervasive issue in the social sciences. Failure to correctly address missing data can lead to biased or inefficient estimation of parameters, confidence intervals, and significance tests. Multiple imputation is a statistical technique for handling missing data that involves using existing data to generate multiple datasets of plausible values for missing data that each incorporate random components to reflect their uncertainty. Each dataset is analyzed individually and identically, and parameter estimates are pooled into one set of estimates, variances, and confidence intervals. Although this technique is widely used, there is little consensus on what constitutes best practices in multiple imputation, including with regard to assessing the extent of missing data bias and reporting multiple imputation procedures in publications. This decision tree was crowdsourced at the 2021 annual meeting of the Society for the Improvement of Psychological Science (SIPS) and revised thereafter. This document is intended to provide practical guidelines for researchers to follow when examining their data for missingness and making decisions about how to handle that missingness. We primarily offer recommendations for multiple imputation, but also indicate where the same decisional guidelines are appropriate for other types of missing data procedures such as full imputation maximum likelihood (FIML).

References

Page 1

	Year	Citations

Page 1