Publication | Closed Access
Deep Statistical Analysis of OCR Errors for Effective Post-OCR Processing
66
Citations
24
References
2019
Year
Unknown Venue
Image AnalysisMachine LearningData ScienceOcr ErrorsPattern RecognitionOcr DocumentsEngineeringText RecognitionText ProcessingOptical Character RecognitionComputer ScienceCharacter RecognitionDeep LearningDeep Statistical AnalysisDocument Processing
Post-OCR is an important processing step that follows optical character recognition (OCR) and is meant to improve the quality of OCR documents by detecting and correcting residual errors. This paper describes the results of a statistical analysis of OCR errors on four document collections. Five aspects related to general OCR errors are studied and compared with human-generated misspellings, including edit operations, length effects, erroneous character positions, real-word vs. non-word errors, and word boundaries. Based on the observations from the analysis we give several suggestions related to the design and implementation of effective OCR post-processing approaches.
| Year | Citations | |
|---|---|---|
Page 1
Page 1