Publication | Closed Access
Field Extraction from Administrative Documents by Incremental Structural Templates
66
Citations
9
References
2013
Year
Unknown Venue
EngineeringIncremental FrameworkCorpus LinguisticsText MiningField ExtractionNatural Language ProcessingInformation RetrievalData ScienceData MiningComputational LinguisticsDocument ClassificationData IntegrationLanguage StudiesContent AnalysisDigital Mail-room ScenarioKnowledge DiscoveryInformation ExtractionData ExtractionText ProcessingStructured DocumentDocument ProcessingField Information
In this paper we present an incremental framework aimed at extracting field information from administrative document images in the context of a Digital Mail-room scenario. Given a single training sample in which the user has marked which fields have to be extracted from a particular document class, a document model representing structural relationships among words is built. This model is incrementally refined as the system processes more and more documents from the same class. A reformulation of the tf-idf statistic scheme allows to adjust the importance weights of the structural relationships among words. We report in the experimental section our results obtained with a large dataset of real invoices.
| Year | Citations | |
|---|---|---|
Page 1
Page 1