Deep Learning Face Attributes in the Wild

TLDR

Predicting face attributes in the wild is challenging due to complex face variations. The authors propose a novel deep learning framework for attribute prediction in the wild. The framework cascades two CNNs, LNet and ANet, jointly fine‑tuned with attribute tags, where LNet is pre‑trained on general object categories for face localization and ANet on massive face identities for attribute prediction. The framework outperforms state‑of‑the‑art methods and demonstrates that distinct pre‑training strategies improve localization and attribute prediction, that LNet’s image‑level fine‑tuning yields strong face‑location cues enabling training without bounding boxes, and that ANet’s high‑level neurons automatically discover and enrich semantic concepts that sparsely explain each attribute.

Abstract

Predicting face attributes in the wild is challenging due to complex face variations. We propose a novel deep learning framework for attribute prediction in the wild. It cascades two CNNs, LNet and ANet, which are fine-tuned jointly with attribute tags, but pre-trained differently. LNet is pre-trained by massive general object categories for face localization, while ANet is pre-trained by massive face identities for attribute prediction. This framework not only outperforms the state-of-the-art with a large margin, but also reveals valuable facts on learning face representation. (1) It shows how the performances of face localization (LNet) and attribute prediction (ANet) can be improved by different pre-training strategies. (2) It reveals that although the filters of LNet are fine-tuned only with image-level attribute tags, their response maps over entire images have strong indication of face locations. This fact enables training LNet for face localization with only image-level annotations, but without face bounding boxes or landmarks, which are required by all attribute recognition works. (3) It also demonstrates that the high-level hidden neurons of ANet automatically discover semantic concepts after pre-training with massive face identities, and such concepts are significantly enriched after fine-tuning with attribute tags. Each attribute can be well explained with a sparse linear combination of these concepts.

References

Page 1

	Year	Citations
ImageNet classification with deep convolutional neural networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Communications of the ACM Convolutional Neural NetworkEngineeringMachine LearningNeural NetworkImagenet Classification	2017	75.5K
ImageNet: A large-scale hierarchical image database Jia Deng, Wei Dong, Richard Socher, 2009 IEEE Conference on Computer Vision and Pattern Recognition EngineeringMachine LearningImage RetrievalImage DatabaseImage Recognition (Computer Vision)	2009	60.2K
LIBLINEAR: A Library for Large Linear Classification Rong-En Fan, Kai‐Wei Chang, Cho‐Jui Hsieh,	2008	6.6K
DeepFace: Closing the Gap to Human-Level Performance in Face Verification Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, Convolutional Neural NetworkEngineeringMachine LearningBiometricsConventional Pipeline	2014	6.5K
Dimensionality Reduction by Learning an Invariant Mapping Raia Hadsell, Sumit Chopra, Yann LeCun Geometric LearningEngineeringMachine LearningManifold ModelingInvariant Mapping	2006	5.1K
Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments Gary B. Huang, Marwan Mattar, Tamara L. Berg, HAL (Le Centre pour la Communication Scientifique Directe)	2008	4.5K
CNN Features Off-the-Shelf: An Astounding Baseline for Recognition Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, Image ClassificationConvolutional Neural NetworkMachine VisionMachine LearningData Science	2014	4.3K
Handwritten Digit Recognition with a Back-Propagation Network Yann LeCun, Bernhard E. Boser, John S. Denker, neural information processing systems Handwritten Digit RecognitionConvolutional Neural NetworkMachine VisionMachine LearningData Science	1989	3.6K
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition Jeff Donahue, Yangqing Jia, Oriol Vinyals, arXiv (Cornell University) Convolutional Neural NetworkEngineeringMachine LearningGeneric Visual RecognitionImage Classification	2013	3.6K
SUN database: Large-scale scene recognition from abbey to zoo Jianxiong Xiao, James Hays, Krista A. Ehinger, Object CategorizationScene AnalysisImage AnalysisMachine VisionData Science	2010	3.1K

Page 1