Concepedia

Publication | Closed Access

Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English

346

Citations

15

References

2013

Year

TLDR

NUCLE has been available for almost two years, yet no detailed reference paper has described it. This paper introduces NUCLE, a large, fully annotated learner English corpus designed to support the development and evaluation of grammatical error correction systems. The corpus was built using a detailed annotation schema and a systematic data collection and annotation process, and includes an unpublished study of annotator agreement for grammatical error correction. The study reports on annotator agreement for grammatical error correction and presents statistics on the distribution of grammatical errors in NUCLE.

Abstract

We describe the NUS Corpus of Learner English (NUCLE), a large, fully annotated corpus of learner English that is freely available for research purposes. The goal of the corpus is to provide a large data resource for the development and evaluation of grammatical error correction systems. Although NUCLE has been available for almost two years, there has been no reference paper that describes the corpus in detail. In this paper, we address this need. We describe the annotation schema and the data collection and annotation process of NUCLE. Most importantly, we report on an unpublished study of annotator agreement for grammatical error correction. Finally, we present statistics on the distribution of grammatical errors in the NUCLE corpus.

References

YearCitations

Page 1