Contrasting Automated and Human Scoring of Essays

Abstract

Essay scoring has traditionally relied on human raters, who understand both the content and the quality of writing. However, the increasing use of constructedresponse items, and the large number of students that will be exposed to such items in assessments based on the Common Core State Standards (CCSS), raise questions about the viability of relying on human scoring alone. This scoring method is expensive, requires extensive logistical efforts, and depends on less-than-perfect human judgment. Testing programs are therefore tapping into the power of computers to score constructed-response items efficiently. The interest in automated scoring of essays is not new and has recently received additional attention from two federally supported consortia, PARCC and Smarter Balanced, which intend to incorporate automated scoring into their common core state assessments planned for 2014. Nonetheless, human labor cannot simply be replaced with machines, since human scoring and automated scoring have different strengths and limitations. In this essay, the two scoring methods are compared from measurement and logistical perspectives. Conclusions are drawn from research literature, including ETS research, to summarize the current state of automated essay scoring technology. The published research has few in-depth comparisons of the advantages and limitations of automated and human scoring. There are also debates in academia, the media, and among the general public concerning the use of automated scoring of essays in standardized tests and in electronic learning environments used in and outside of classrooms. It is important for test developers, policymakers, and educators to have sufficient knowledge about the strengths and weaknesses of each scoring method in order to prevent misuse in a testing program. The purpose of this essay is to contrast significant characteristics of the two scoring methods, elucidate their differences, and discuss their practical implications for testing programs.

References

Page 1

	Year	Citations

Page 1