Publication | Closed Access
A Shallow Text Processing Core Engine
53
Citations
20
References
2002
Year
EngineeringCorpus LinguisticsLanguage ProcessingText MiningNatural Language ProcessingIntelligent ExtractionInformation RetrievalData ScienceData MiningComputational LinguisticsWord Segmentation (Natural Language Processing)Present Smes–sppcChunk ParsingLanguage StudiesNamed-entity RecognitionMachine TranslationWord Segmentation (Phonological Awareness)Computer ScienceInformation ExtractionShallow ParsingText ProcessingLinguisticsChunking
The article introduces SMES–SPPC, a high‑performance system for extracting structured data from free text documents. SMES–SPPC is built from domain‑adaptive shallow core components implemented as cascaded weighted finite‑state machines and dynamic tries, providing German morphological analysis, compound parsing, POS filtering, named‑entity recognition, and a novel divide‑and‑conquer chunk parser. The system proved effective for free‑word‑order languages, processing German at over 6000 words per second with high linguistic coverage and an 87.14 % F‑measure on unseen data.
In this article we present SMES–SPPC, a high–performance system for intelligent extraction of structured data from free text documents. SMES–SPPC consists of a set of domain–adaptive shallow core components that are realized by means of cascaded weighted finite–state machines and generic dynamic tries. The system has been fully implemented for German; it includes morphological and on–line compound analysis, efficient POS–filtering, high–performance named–entity recognition and chunk parsing based on a novel divide–and–conquer strategy. The whole approach proved to be very useful for processing free word order languages such as German. SMES–SPPC has a good performance (more than 6000 words per second on standard PC environments) and achieves high linguistic coverage, especially for the divide–and–conquer parsing strategy, where we obtained an f –measure of 87.14% on unseen data.
| Year | Citations | |
|---|---|---|
Page 1
Page 1