An Evaluation of the IntelliMetric[SM] Essay Scoring System.

Abstract

This report provides a two-part evaluation of the IntelliMetricSM automated essay scoring system based on its performance scoring essays from the Analytic Writing Assessment of the Graduate Management Admission TestTM (GMATTM). The IntelliMetric system performance is first compared to that of individual human raters, a Bayesian system employing simple word counts, and a weighted probability model using more than 750 responses to each of six prompts. The second, larger evaluation compares the IntelliMetric system ratings to those of human raters using approximately 500 responses to each of 101 prompts. Results from both evaluations suggest the IntelliMetric system is a consistent, reliable system for scoring AWA essays with a perfect + adjacent agreement on 96% to 98% and 92% to 100% of instances in evaluations 1 and 2, respectively. The Pearson r correlations of agreement between human raters and the IntelliMetric system averaged .83 in both evaluations. Volume 4, Number 4

References

Page 1

	Year	Citations

Page 1