A New Dataset and Method for Automatically Grading ESOL Texts

TLDR

The study demonstrates using supervised discriminative machine learning to automate grading of ESOL examination scripts. The authors use rank preference learning with extracted features, conduct ablation tests, compare regression and rank models, and evaluate outlier texts to validate the grading system. Experimental results on the first public dataset show near‑human agreement and identify outlier cases where model scores diverge from human examiners.

Abstract

We demonstrate how supervised discriminative machine learning techniques can be used to automate the assessment of 'English as a Second or Other Language' (ESOL) examination scripts. In particular, we use rank preference learning to explicitly model the grade relationships between scripts. A number of different features are extracted and ablation tests are used to investigate their contribution to overall performance. A comparison between regression and rank preference models further supports our method. Experimental results on the first publically available dataset show that our system can achieve levels of performance close to the upper bound for the task, as defined by the agreement between human examiners on the same corpus. Finally, using a set of 'outlier' texts, we test the validity of our model and identify cases where the model's scores diverge from that of a human examiner.

References

Page 1

	Year	Citations

Page 1