Automatic analysis of syntactic complexity in second language writing

TLDR

The authors present a computational system that automatically computes fourteen syntactic complexity measures for second‑language writing, targeting advanced proficiency research. The system ingests a written sample and outputs the fourteen indices, and was developed and evaluated on college‑level Chinese learner data from the Written English Corpus, with an example application distinguishing proficiency levels. Experiments demonstrate that the system achieves very high reliability on unseen test data from the corpus.

Abstract

We describe a computational system for automatic analysis of syntactic complexity in second language writing using fourteen different measures that have been explored or proposed in studies of second language development. The system takes a written language sample as input and produces fourteen indices of syntactic complexity of the sample based on these measures. The system is designed with advanced second language proficiency research in mind, and is therefore developed and evaluated using college-level second language writing data from the Written English Corpus of Chinese Learners (Wen et al. 2005). Experimental results show that the system achieves very high reliability on unseen test data from the corpus. We illustrate how the system is used in an example application to investigate whether and to what extent each of these measures significantly differentiate between different proficiency levels

References

Page 1

	Year	Citations

Page 1