Manga109 dataset and creation of metadata

TLDR

This dataset offers many comic images but lacks the element annotations needed for machine‑learning use or method evaluation. The authors aim to construct comprehensive metadata for Manga109. They define metadata as frames, texts, and characters, and present a web‑based tool to efficiently generate ground‑truth annotations. They released Manga109, a publicly available collection of 109 Japanese comic books, and provided annotation guidelines to improve metadata quality.

Abstract

We have created Manga109, a dataset of a variety of 109 Japanese comic books publicly available for use for academic purposes. This dataset provides numerous comic images but lacks the annotations of elements in the comics that are necessary for use by machine learning algorithms or evaluation of methods. In this paper, we present our ongoing project to build metadata for Manga109. We first define the metadata in terms of frames, texts and characters. We then present our web-based software for efficiently creating the ground truth for these images. In addition, we provide a guideline for the annotation with the intent of improving the quality of the metadata.

References

Page 1

	Year	Citations

Page 1