Concepedia

Publication | Closed Access

Best practices for convolutional neural networks applied to visual document analysis

2.8K

Citations

7

References

2005

Year

TLDR

Neural networks are powerful for classifying visual document inputs, yet the field contains many confusing methods; convolutional neural networks are better suited for such tasks and can be implemented simply without complex techniques. This paper outlines concrete best practices for document analysis with neural networks and proposes a flexible, do‑it‑yourself convolutional architecture suitable for many visual document problems. The authors implement this simple convolutional architecture and demonstrate its effectiveness on the MNIST English digit dataset. Key findings include that expanding the training set with distorted data improves performance, and that the resulting simple architecture achieves state‑of‑the‑art results in document analysis.

Abstract

Neural networks are a powerful technology forclassification of visual inputs arising from documents.However, there is a confusing plethora of different neuralnetwork methods that are used in the literature and inindustry. This paper describes a set of concrete bestpractices that document analysis researchers can use toget good results with neural networks. The mostimportant practice is getting a training set as large aspossible: we expand the training set by adding a newform of distorted data. The next most important practiceis that convolutional neural networks are better suited forvisual document tasks than fully connected networks. Wepropose that a simple do-it-yourself implementation ofconvolution with a flexible architecture is suitable formany visual document problems. This simpleconvolutional neural network does not require complexmethods, such as momentum, weight decay, structure-dependentlearning rates, averaging layers, tangent prop,or even finely-tuning the architecture. The end result is avery simple yet general architecture which can yieldstate-of-the-art performance for document analysis. Weillustrate our claims on the MNIST set of English digitimages.

References

YearCitations

Page 1