Publication | Open Access
Grammar as a Foreign Language
402
Citations
29
References
2014
Year
Syntactic ParsingEngineeringMultilingualismSyntactic StructureCorpus LinguisticsText MiningNatural Language ProcessingSyntaxComputational LinguisticsAttention MechanismGrammarLanguage StudiesSyntactic Constituency ParsingMachine TranslationSyntactic ConstituencyGrammatical FormalismGrammar InductionSemantic ParsingShallow ParsingParsingTreebanksForeign LanguageLinguisticsPo Tagging
Syntactic constituency parsing is a fundamental NLP problem that has driven intensive research, yet the most accurate parsers remain domain‑specific, complex, and inefficient. The study demonstrates that a domain‑agnostic attention‑enhanced sequence‑to‑sequence model attains state‑of‑the‑art parsing performance when trained on a large synthetic corpus annotated by existing parsers. The model matches standard parsers with only a small human‑annotated dataset, proving high data efficiency, and it processes over a hundred sentences per second on an unoptimized CPU.
Syntactic constituency parsing is a fundamental problem in natural language processing and has been the subject of intensive research and engineering for decades. As a result, the most accurate parsers are domain specific, complex, and inefficient. In this paper we show that the domain agnostic attention-enhanced sequence-to-sequence model achieves state-of-the-art results on the most widely used syntactic constituency parsing dataset, when trained on a large synthetic corpus that was annotated using existing parsers. It also matches the performance of standard parsers when trained only on a small human-annotated dataset, which shows that this model is highly data-efficient, in contrast to sequence-to-sequence models without the attention mechanism. Our parser is also fast, processing over a hundred sentences per second with an unoptimized CPU implementation.
| Year | Citations | |
|---|---|---|
Page 1
Page 1