Publication | Closed Access
Machine learning methods for automatically processing historical documents: from paper acquisition to XML transformation
24
Citations
13
References
2004
Year
Unknown Venue
Eu Project CollateEngineeringMachine LearningKnowledge ExtractionPaper AcquisitionSemantic WebDocument Processing SystemText MiningNatural Language ProcessingInformation RetrievalData ScienceData MiningPattern RecognitionDigitized Cultural MaterialComputational LinguisticsDocument EngineeringDocument ClassificationHistorical DocumentsKnowledge DiscoveryComputer ScienceInformation ExtractionAutomated Machine LearningStructured DocumentDocument ProcessingData Modeling
One of the aims of the EU project COLLATE is to design and implement a Web-based collaboratory for archives, scientists and end-users working with digitized cultural material. Since the originals of such a material are often unique and scattered in various archives, severe problems arise for their wide fruition. A solution would be to develop intelligent document processing tools that automatically transform printed documents into a Web-accessible form such as XML. Here, we propose the use of a document processing system, WISDOM++, which uses heavily machine learning techniques in order to perform such a task, and report promising results obtained in preliminary experiments.
| Year | Citations | |
|---|---|---|
Page 1
Page 1