Concepedia

Abstract

Incomplete, or missing data is likely to be encountered in empirical software engineering data sets. The authors evaluate some methods for handling missing data. The methods are presented and discussed in general and thereafter applied to effort estimation of ERP projects. We found that two sampling based methods, mean imputation (MI) and similar response pattern imputation (SRPI), waste less information than listwise deletion (LD). However, MI may introduce more bias than the SRPI method. Compared to sampling based methods, likelihood based imputation methods require too large data sets to be realistic to use in empirical software engineering. None of the sampling based methods, such as MI and SRPI, seem able to correct bias. So, though imputation is an attractive idea, the available methods still have severe limitations.

References

YearCitations

Page 1