Publication | Closed Access
A Chinese OCR spelling check approach based on statistical language models
30
Citations
12
References
2005
Year
Unknown Venue
EngineeringStatistical Language ModelCheck ApproachChinese OcrNatural Language ProcessingSpeech RecognitionLanguage DocumentationInformation RetrievalPattern RecognitionText RecognitionComputational LinguisticsLanguage StudiesCharacter RecognitionMachine TranslationStatistical Language ModelsOptical Character RecognitionEast Asian LanguagesText ProcessingLinguisticsDocument Processing
This work describes an effective spelling check approach for Chinese OCR with a new multi-knowledge based statistical language model. This language model combines the conventional n-gram language model and the new LSA (latent semantic analysis) language model, so both local information (syntax) and global information (semantic) are utilized. Furthermore, Chinese similar characters are used in Viterbi search process to expand the candidate list in order to add more possible correct results. With our approach, the best recognition accuracy rate increases from 79.3% to 91.9%, which means 60.9% error reduction.
| Year | Citations | |
|---|---|---|
Page 1
Page 1