Q-Matrix Construction: Defining the Link Between Constructs and Test Items in Large-Scale Reading and Listening Comprehension Assessments

Abstract

In cognitive diagnosis a Q-matrix (Tatsuoka, 1983 Tatsuoka, K. K. 1983. Rule space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20: 345–354. [Crossref], [Web of Science ®] , [Google Scholar], 1990 Tatsuoka, K. K. 1990. “Toward an integration of item-response theory and cognitive error diagnosis”. In Diagnostic monitoring of skill and knowledge acquisition, Edited by: Fredericksen, N., Glaser, R., Lesgold, A. and Shafto, M. G. 453–488. Hillsdale, NJ: Erlbaum. [Google Scholar]), which is an incidence matrix that defines the relationships between test items and constructs of interest, has great impact on the nature of performance feedback that can be provided to score users. The purpose of the present study was to identify meaningful skill coding categories that reflect core language skills and processes assessed in the Reading and Listening sections of the Test of English as a Foreign LanguageTM Internet-based Test (TOEFL® iBT). The study was conducted as part of a research activity to explore the possibility of developing a detailed score report for low-stakes use by taking a cognitive diagnosis approach. Content experts conducted a test content analysis to develop draft Q-matrices, while measurement experts empirically analyzed the draft Q-matrices with examinee performance data using a cognitively diagnostic psychometric model called the fusion model (DiBello, Stout, & Roussos, 1995; Hartz, 2002 Hartz, S. M. 2002. A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality, Urbana–Champaign: Unpublished doctoral dissertation, University of Illinois. [Google Scholar]). The draft Q-matrices were refined by repeating fusion model analysis and revision of skill definitions and item coding. This resulted in a set of Q-matrices that represented substantively meaningful score reporting categories of a suitable grain size for score reporting, while maintaining an acceptable level of examinee classification consistency.

References

Page 1

	Year	Citations

Page 1