Concepedia

Publication | Closed Access

DIVA-HisDB: A Precisely Annotated Large Dataset of Challenging Medieval Manuscripts

104

Citations

15

References

2016

Year

Abstract

This paper introduces a publicly available historical manuscript database DIVA-HisDB for the evaluation of several Document Image Analysis (DIA) tasks. The database consists of 150 annotated pages of three different medieval manuscripts with challenging layouts. Furthermore, we provide a layout analysis ground-truth which has been iterated on, reviewed, and refined by an expert in medieval studies. DIVA-HisDB and the ground truth can be used for training and evaluating DIA tasks, such as layout analysis, text line segmentation, binarization and writer identification. Layout analysis results of several representative baseline technologies are also presented in order to help researchers evaluate their methods and advance the frontiers of complex historical manuscripts analysis. An optimized state-of-the-art Convolutional Auto-Encoder (CAE) performs with around 95% accuracy, demonstrating that for this challenging layout there is much room for improvement. Finally, we show that existing text line segmentation methods fail due to interlinear and marginal text elements.

References

YearCitations

Page 1