Concepedia

TLDR

The article introduces SMES–SPPC, a high‑performance system for extracting structured data from free text documents. SMES–SPPC is built from domain‑adaptive shallow core components implemented as cascaded weighted finite‑state machines and dynamic tries, providing German morphological analysis, compound parsing, POS filtering, named‑entity recognition, and a novel divide‑and‑conquer chunk parser. The system proved effective for free‑word‑order languages, processing German at over 6000 words per second with high linguistic coverage and an 87.14 % F‑measure on unseen data.

Abstract

In this article we present SMES–SPPC, a high–performance system for intelligent extraction of structured data from free text documents. SMES–SPPC consists of a set of domain–adaptive shallow core components that are realized by means of cascaded weighted finite–state machines and generic dynamic tries. The system has been fully implemented for German; it includes morphological and on–line compound analysis, efficient POS–filtering, high–performance named–entity recognition and chunk parsing based on a novel divide–and–conquer strategy. The whole approach proved to be very useful for processing free word order languages such as German. SMES–SPPC has a good performance (more than 6000 words per second on standard PC environments) and achieves high linguistic coverage, especially for the divide–and–conquer parsing strategy, where we obtained an f –measure of 87.14% on unseen data.

References

YearCitations

Page 1