Sampling Variability of Performance Assessments

TLDR

The article frames performance assessments within a sampling framework. The authors use generalizability theory to analyze performance assessments as samples drawn from a complex universe of tasks, occasions, raters, and measurement methods. Task‑sampling variability dominates measurement error, requiring many tasks for reliable elementary math and science scores, and methods do not converge, so scores depend on both task and method.

Abstract

In this article, performance assessments are cast within a sampling framework. More specifically, a performance assessment is viewed as a sample of student performance drawn from a complex universe defined by a combination of all possible tasks, occasions, raters, and measurement methods. Using generalizability theory, we present evidence bearing on the generalizability and convergent validity of performance assessments sampled from a range of measurement facets and measurement methods. Results at both the individual and school level indicate that task‐sampling variability is the major source ofmeasurment error. Large numbers of tasks are needed to get a reliable measure of mathematics and science achievement at the elementary level. With respect to convergent validity, results suggest that methods do not converge. Students' performance scores, then, are dependent on both the task and method sampled.

References

Page 1

	Year	Citations

Page 1