Publication | Closed Access
A document classification and extraction system with learning ability
17
Citations
5
References
1999
Year
Unknown Venue
Document Image ProcessingEngineeringStructural Pattern RecognitionLogical ClosenessDocument Image AnalysisOcr PhaseText MiningNatural Language ProcessingImage AnalysisInformation RetrievalData ScienceData MiningPattern RecognitionExtraction SystemText RecognitionDocument UnderstandingDocument ClassificationCharacter RecognitionOptical Character RecognitionKnowledge DiscoveryComputer ScienceInformation ExtractionComputer VisionData ExtractionDocument Processing
Document image processing begins at the OCR phase with the difficulty of automatic document analysis and understanding. Most existing systems only do well in their specific application domains. In this paper, we describe a domain-independent automatic document image understanding system with learning ability. A segmentation method based on "logical closeness" is proposed. A novel and natural representation of document layout structure-a directed weight graph (DWG)-is described. To classify a given document, a string representation matching algorithm is applied first, instead of comparing all the sample graphs. A frame template and a document type hierarchy (DTH) are used to represent the document's logical structure and the hierarchical relationships among these frame templates, respectively. In this paper, two learning methodologies are applied-learning from experience and an enhanced perceptron learning algorithm.
| Year | Citations | |
|---|---|---|
Page 1
Page 1