Publication | Closed Access
Multifont OCR Postprocessing System
29
Citations
7
References
1975
Year
EngineeringVector ProcessingCorpus LinguisticsText MiningSpeech RecognitionNatural Language ProcessingInformation RetrievalData SciencePattern RecognitionText RecognitionComputational LinguisticsOcr DataLanguage StudiesCharacter RecognitionSoftware SimulatorOptical Character RecognitionComputer ScienceText ProcessingLinguisticsDocument Processing
A series of techniques is being developed to postprocess noisy, multifont, nonformatted OCR data on a word basis to 1) determine if a field is alphabetic or numeric; 2) verify that an alphabetic word is legitimate; 3) fetch from a dictionary a set of potential entries using a garbled word as a key; and 4) error-correct the garbled word by selecting the most likely dictionary word. Four algorithms were developed using a technique called vector processing (representing alphabetic words as numeric vectors) and also by applying Bayes maximum likelihood solutions to correct the OCR output. The result was the development of a software simulator which processed sequential fields generated by the Advanced Optical Character Reader (in use by the U.S. Postal Service in New York City), performed the four functions indicated above, and selected the correct alphabetic word from a dictionary of 62,000 entries.
| Year | Citations | |
|---|---|---|
Page 1
Page 1