Concepedia

Publication | Closed Access

Sentence Boundary Detection: A Long Solved Problem?

51

Citations

13

References

2012

Year

TLDR

The authors review current automated sentence boundary detection methods and propose a generalized task definition to stimulate renewed research. They conduct a systematic empirical survey of many existing SBD approaches across diverse corpora and introduce a generalized definition of the task. They find that prior SBD studies suffer from limited comparability, reproducibility, and narrow task framing, leading to overoptimistic performance estimates, and that performance drops on informal text but can be improved by leveraging document structure.

Abstract

We review the state of the art in automated sentence boundary detection (SBD) for English and call for a renewed research interest in this foundational first step in natural language processing. We observe severe limitations in comparability and reproducibility of earlier work and a general lack of knowledge about genre- and domain-specific variations. To overcome these barriers, we conduct a systematic empirical survey of a large number of extant approaches, across a broad range of diverse corpora. We further observe that much previous work interpreted the SBD task too narrowly, leading to overly optimistic estimates of SBD performance on running text. To better relate SBD to practical NLP use cases, we thus propose a generalized definition of the task, eliminating text- or language-specific assumptions about candidate boundary points. More specifically, we quantify degrees of variation across ‘standard’ corpora of edited, relatively formal language, as well as performance degradation when moving to less formal language, viz. various samples of user-generated Web content. For these latter types of text, we demonstrate how moderate interpretation of document structure (as is now often available more or less explicitly through mark-up) can substantially contribute to overall SBD performance.

References

YearCitations

Page 1