Publication | Open Access
Deep Learning for Classical Japanese Literature
484
Citations
13
References
2018
Year
EngineeringMachine LearningBenchmark TasksCross-lingual RepresentationWriter IdentificationMultilingual PretrainingLarge Language ModelCorpus LinguisticsText MiningNatural Language ProcessingData ScienceComputational LinguisticsJapan StudyLanguage StudiesMachine TranslationLarge Ai ModelBenchmark DatasetsKnowledge DiscoveryEast Asian LanguagesDeep LearningClassical Japanese LiteratureCultural RelevanceCursive JapaneseLinguistics
Machine learning research often prioritizes benchmark performance, yet there is a growing call to align tasks with socially or culturally relevant problems. The authors introduce Kuzushiji-MNIST, Kuzushiji-49, and Kuzushiji-Kanji to encourage the ML community to explore classical Japanese literature. They provide three datasets—Kuzushiji-MNIST, Kuzushiji-49, and Kuzushiji-Kanji—comprising handwritten cursive Japanese characters for machine learning tasks. The datasets are available at https://github.com/rois-codh/kmnist.
Much of machine learning research focuses on producing models which perform well on benchmark tasks, in turn improving our understanding of the challenges associated with those tasks. From the perspective of ML researchers, the content of the task itself is largely irrelevant, and thus there have increasingly been calls for benchmark tasks to more heavily focus on problems which are of social or cultural relevance. In this work, we introduce Kuzushiji-MNIST, a dataset which focuses on Kuzushiji (cursive Japanese), as well as two larger, more challenging datasets, Kuzushiji-49 and Kuzushiji-Kanji. Through these datasets, we wish to engage the machine learning community into the world of classical Japanese literature. Dataset available at https://github.com/rois-codh/kmnist
| Year | Citations | |
|---|---|---|
Page 1
Page 1