Concepedia

Publication | Closed Access

Using SMT for OCR error correction of historical texts

40

Citations

16

References

2016

Year

Abstract

A trend to digitize historical paper-based archives has emerged in recent years, with the advent of digital optical scanners. A lot of
\npaper-based books, textbooks, magazines, articles, and documents are being transformed into electronic versions that can be manipulated
\nby a computer. For this purpose, Optical Character Recognition (OCR) systems have been developed to transform scanned digital
\ntext into editable computer text. However, different kinds of errors in the OCR system output text can be found, but Automatic Error
\nCorrection tools can help in performing the quality of electronic texts by cleaning and removing noises. In this paper, we perform a
\nqualitative and quantitative comparison of several error-correction techniques for historical French documents. Experimentation shows
\nthat our Machine Translation for Error Correction method is superior to other Language Modelling correction techniques, with nearly
\n13% relative improvement compared to the initial baseline.

References

YearCitations

1965

10.4K

2007

4.9K

1993

4.1K

2003

2.8K

2011

1.1K

2001

1K

2008

369

1991

263

2012

75

2006

39

Page 1