Unconstrained Fashion Landmark Detection via Hierarchical Recurrent Transformer Networks

Abstract

Fashion landmarks are functional key points defined on clothes, such as corners of neckline, hemline, and cuff. They have been recently introduced [18]as an effective visual representation for fashion image understanding. However, detecting fashion landmarks are challenging due to background clutters, human poses, and scales. To remove the above variations, previous works usually assumed bounding boxes of clothes are provided in training and test as additional annotations, which are expensive to obtain and inapplicable in practice. This work addresses unconstrained fashion landmark detection, where clothing bounding boxes are not provided in both training and test. To this end, we present a novel Deep LAndmark Network (DLAN), where bounding boxes and landmarks are jointly estimated and trained iteratively in an end-to-end manner. DLAN contains two dedicated modules, including a Selective Dilated Convolution for handling scale discrepancies, and a Hierarchical Recurrent Spatial Transformer for handling background clutters. To evaluate DLAN, we present a large-scale fashion landmark dataset, namely Unconstrained Landmark Database (ULD), consisting of 30K images. Statistics show that ULD is more challenging than existing datasets in terms of image scales, background clutters, and human poses. Extensive experiments demonstrate the effectiveness of DLAN over the state-of-the-art methods. DLAN also exhibits excellent generalization across different clothing categories and modalities, making it extremely suitable for real-world fashion analysis.

References

Page 1

	Year	Citations
Very Deep Convolutional Networks for Large-Scale Image Recognition Karen Simonyan, Andrew Zisserman arXiv (Cornell University) Geometric LearningConvolutional Neural NetworkEngineeringMachine LearningConvolutional Network Depth	2014	75.4K
ImageNet: A large-scale hierarchical image database Jia Deng, Wei Dong, Richard Socher, 2009 IEEE Conference on Computer Vision and Pattern Recognition EngineeringMachine LearningImage RetrievalImage DatabaseImage Recognition (Computer Vision)	2009	60.2K
Fully convolutional networks for semantic segmentation Jonathan Long, Evan Shelhamer, Trevor Darrell	2015	36.2K
Fast R-CNN Ross Girshick Image ClassificationConvolutional Neural NetworkImage AnalysisMachine LearningMachine Vision	2015	27.2K
Backpropagation Applied to Handwritten Zip Code Recognition Yann LeCun, Bernhard E. Boser, J. S. Denker, Neural Computation Artificial IntelligenceConvolutional Neural NetworkEngineeringMachine LearningAi Foundation	1989	11.6K
DeepPose: Human Pose Estimation via Deep Neural Networks Alexander Toshev, Christian Szegedy Geometric LearningConvolutional Neural NetworkDeep Neural NetworksMachine VisionMachine Learning	2014	3.2K
Face detection, pose estimation, and landmark localization in the wild Xiangxin Zhu, Deva Ramanan	2012	2.2K
DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations Ziwei Liu, Ping Luo, Shi Qiu, Convolutional Neural NetworkEngineeringMachine LearningImage RetrievalStyle Transfer	2016	1.8K
Multi-Scale Context Aggregation by Dilated Convolutions Fisher Yu, Vladlen Koltun arXiv (Cornell University) Convolutional Neural NetworkScene AnalysisEngineeringMachine LearningDense Prediction	2015	1.6K
Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation Jonathan Tompson, Arjun Jain, Yann LeCun, arXiv (Cornell University) Geometric LearningConvolutional Neural NetworkEngineeringMachine LearningHuman Pose Estimation	2014	972

Page 1