Concepedia

Abstract

The automatic transcription of historical documents is vital for the creation of digital libraries. In order to make images of valuable old documents amenable to browsing, a transcription of high accuracy is needed. In this paper, two state-of-the art recognizers originally developed for modern scripts are applied to medieval documents. The first is based on Hidden Markov Models and the second uses a Neural Network with a bidirectional Long Short-Term Memory. On a dataset of word images extracted from a medieval manuscript of the 13th century, written in Middle High German by several writers, it is demonstrated that a word accuracy of 93.32% is achievable. This is far above the word accuracy of 77.12% achieved with the same recognizers for unconstrained modern scripts written in English. These results encourage the development of real world systems for automatic transcription of historical documents with a view to image and text browsing in digital libraries.

References

YearCitations

Page 1