Concepedia

TLDR

Advances in NLP and computational linguistics have enabled major improvements to traditional readability formulas, and recent studies have shown that theoretically motivated linguistic features outperform classic formulas such as Flesch–Kincaid. This study aims to develop new readability models that use advanced NLP tools to assess both text comprehension and reading speed. The authors collected crowdsourced judgments of comprehension and speed across diverse topics, built models from state‑of‑the‑art NLP linguistic features, and compared their accuracy to classic readability formulas. The resulting models, based on theoretically grounded linguistic features, significantly outperformed classic readability formulas in predicting comprehension and reading speed.

Abstract

Background Advances in natural language processing (NLP) and computational linguistics have facilitated major improvements on traditional readability formulas that aim at predicting the overall difficulty of a text. Recent studies have identified several types of linguistic features that are theoretically motivated and predictive of human judgments of text readability, which outperform predictions made by traditional readability formulas, such as Flesch–Kincaid. The purpose of this study is to develop new readability models using advanced NLP tools to measure both text comprehension and reading speed. Methods This study used crowdsourcing techniques to collect human judgments of text comprehension and reading speed across a diverse variety of topic domains (science, technology and history). Linguistic features taken from state‐of‐the‐art NLP tools were used to develop models explaining human judgments of text comprehension and reading speed. The accuracy of these models was then compared with classic readability formulas. Results The results indicated that models employing linguistic features more theoretically related to text comprehension and reading speed outperform classic readability models. Conclusions This study developed new readability formulas based on advanced NLP tools for both text comprehension and reading speed. These formulas, based on linguistic features that better represent theoretical and behavioural accounts of the reading process, significantly outperformed classic readability formulas.

References

YearCitations

Page 1