Publication | Closed Access
Neural Text Segmentation and its Application to Sentiment Analysis
47
Citations
56
References
2020
Year
EngineeringSentiment AnalysisLanguage ProcessingText MiningNatural Language ProcessingText SegmentationComputational LinguisticsWord Segmentation (Natural Language Processing)Recurrent Neural NetworksLanguage StudiesWord Segmentation (Phonological Awareness)Nlp TaskLanguage Modeling (Natural Language Processing)Computer ScienceInformation ExtractionLanguage Modeling (Theoretical Linguistics)Text ProcessingLinguisticsChunkingNeural Text Segmentation
Text segmentation is a fundamental task in natural language processing. Depending on the levels of granularity, the task can be defined as segmenting a document into topical segments, or segmenting a sentence into elementary discourse units (EDUs). Traditional solutions to the two tasks heavily rely on carefully designed features. The recently proposed neural models do not need manual feature engineering, but they either suffer from sparse boundary tags or cannot efficiently handle the issue of variable size output vocabulary. In light of such limitations, we propose a generic end-to-end segmentation model, namely <inline-formula><tex-math notation="LaTeX">${\mathrm{S}\scriptstyle{\mathrm{EG}}}{\mathrm{B}\scriptstyle{\mathrm{OT}}}$</tex-math></inline-formula> , which first uses a bidirectional recurrent neural network to encode an input text sequence. <inline-formula><tex-math notation="LaTeX">${\mathrm{S}\scriptstyle{\mathrm{EG}}}{\mathrm{B}\scriptstyle{\mathrm{OT}}}$</tex-math></inline-formula> then uses another recurrent neural networks, together with a pointer network, to select text boundaries in the input sequence. In this way, <inline-formula><tex-math notation="LaTeX">${\mathrm{S}\scriptstyle{\mathrm{EG}}}{\mathrm{B}\scriptstyle{\mathrm{OT}}}$</tex-math></inline-formula> does not require any hand-crafted features. More importantly, <inline-formula><tex-math notation="LaTeX">${\mathrm{S}\scriptstyle{\mathrm{EG}}}{\mathrm{B}\scriptstyle{\mathrm{OT}}}$</tex-math></inline-formula> inherently handles the issue of variable size output vocabulary and the issue of sparse boundary tags. In our experiments, <inline-formula><tex-math notation="LaTeX">${\mathrm{S}\scriptstyle{\mathrm{EG}}}{\mathrm{B}\scriptstyle{\mathrm{OT}}}$</tex-math></inline-formula> outperforms state-of-the-art models on two tasks: document-level topic segmentation and sentence-level EDU segmentation. As a downstream application, we further propose a hierarchical attention model for sentence-level sentiment analysis based on the outcomes of <inline-formula><tex-math notation="LaTeX">${\mathrm{S}\scriptstyle{\mathrm{EG}}}{\mathrm{B}\scriptstyle{\mathrm{OT}}}$</tex-math></inline-formula> . The hierarchical model can make full use of both word-level and EDU-level information simultaneously for sentence-level sentiment analysis. In particular, it can effectively exploit EDU-level information, such as the inner properties of EDUs, which cannot be fully encoded in word-level features. Experimental results show that our hierarchical model achieves new state-of-the-art results on the Movie Review and Stanford Sentiment Treebank benchmarks.
| Year | Citations | |
|---|---|---|
Page 1
Page 1