Deep Sparse Rectifier Neural Networks

Abstract

While logistic sigmoid neurons are more biologically plausible than hyperbolic tangent neurons, the latter work better for training multi-layer neural networks. This paper shows that rectifying neurons are an even better model of biological neurons and yield equal or better performance than hyperbolic tangent networks in spite of the hard non-linearity and non-differentiability at zero, creating sparse representations with true zeros, which seem remarkably suitable for naturally sparse data. Even though they can take advantage of semi-supervised setups with extra-unlabeled data, deep rectifier networks can reach their best performance without requiring any unsupervised pre-training on purely supervised tasks with large labeled datasets. Hence, these results can be seen as a new milestone in the attempts at understanding the difficulty in training deep but purely supervised neural networks, and closing the performance gap between neural networks learnt with and without unsupervised pre-training. 1

References

Page 1

	Year	Citations
Gradient-based learning applied to document recognition Yann LeCun, Léon Bottou, Yoshua Bengio, Proceedings of the IEEE EngineeringMachine LearningMultilayer Neural NetworksImage AnalysisData Science	1998	56.5K
A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton, Simon Osindero, Yee‐Whye Teh Neural Computation	2006	16.2K
Rectified Linear Units Improve Restricted Boltzmann Machines Vinod Nair, Geoffrey E. Hinton International Conference on Machine Learning Convolutional Neural NetworkEngineeringMachine LearningAutoencodersRecurrent Neural Network	2010	13.2K
Understanding the difficulty of training deep feedforward neural networks Xavier Glorot, Yoshua Bengio	2010	12.6K
Extracting and composing robust features with denoising autoencoders Pascal Vincent, Hugo Larochelle, Yoshua Bengio, EngineeringMachine LearningAutoencodersRobust FeaturesRobust Feature	2008	7.2K
Decoding by Linear Programming Emmanuel J. Candès, Terence Tao IEEE Transactions on Information Theory Mathematical ProgrammingEngineeringFault EstimationNatural ErrorUniform Uncertainty Principle	2005	7.2K
Sparse coding with an overcomplete basis set: A strategy employed by V1? Bruno A. Olshausen, David J. Field Vision Research	1997	3.7K
What is the best multi-stage architecture for object recognition? Kevin Jarrett, Koray Kavukcuoglu, M. Ranzato, Best Multi-stage ArchitectureConvolutional Neural NetworkFeature Extraction StagesMachine LearningFeature Detection	2009	2.2K
Why Does Unsupervised Pre-training Help Deep Learning? Dumitru Erhan	2010	2.1K
Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification John Blitzer, Mark Dredze, Fernando Pereira	2007	2K

Page 1