Concepedia

Abstract

Abstract Differential item functioning (DIF) assessment procedures for items with more than 2 ordered score categories were evaluated. Three descriptive statistics—the standardized mean difference (SMD; Dorans & Schmitt, 1991) and 2 procedures based on SIBTEST (Shealy & Stout, 1993)—were considered, along with 5 inferential procedures: 2 based on SMD, 2 based on SIBTEST, and the Mantel (1963) method. A simulation showed that, when the 2 examinee groups had the same distribution, the descriptive index that performed best was the SMD. When the group means differed by 1 SD, a modified form of the SIBTEST DIF effect size measure tended to perform best. The 5 inferential procedures performed almost indistinguishably when the 2 groups had identical distributions. When the groups had different distributions and the studied item was highly discriminating, the SIBTEST procedures showed much better Type I error control than did the SMD and Mantel methods, particularly in short tests. The power ranking of the 5 procedures was inconsistent; it depended on the direction of DIF and other factors. Routine application of these polytomous DIF methods seems feasible when a reliable test is available for matching examinees. The Type I error rates of the Mantel and SMD methods may be a concern under certain conditions. The current version of SIBTEST cannot easily accommodate matching tests that do not use number-right scoring. Additional research in these areas would be useful.

References

YearCitations

Page 1