HD-CNN: Hierarchical Deep Convolutional Neural Networks for Large Scale Visual Recognition

TLDR

Image classification suffers from uneven visual separability, with some categories hard to distinguish, and current flat CNNs do not exploit category hierarchies. The paper proposes HD‑CNNs that embed deep CNNs into a two‑level category hierarchy. HD‑CNNs use a coarse classifier to separate easy classes and fine classifiers for difficult ones, trained via component‑wise pretraining followed by global fine‑tuning with a multinomial logistic loss regularized by coarse consistency, and employ conditional execution of fine classifiers and layer compression for scalability. The approach achieves state‑of‑the‑art accuracy on CIFAR‑100 and ImageNet, reducing top‑1 error of standard CNNs by 2.65 %, 3.1 %, and 1.1 % across three HD‑CNN variants.

Abstract

In image classification, visual separability between different object categories is highly uneven, and some categories are more difficult to distinguish than others. Such difficult categories demand more dedicated classifiers. However, existing deep convolutional neural networks (CNN) are trained as flat N-way classifiers, and few efforts have been made to leverage the hierarchical structure of categories. In this paper, we introduce hierarchical deep CNNs (HD-CNNs) by embedding deep CNNs into a two-level category hierarchy. An HD-CNN separates easy classes using a coarse category classifier while distinguishing difficult classes using fine category classifiers. During HDCNN training, component-wise pretraining is followed by global fine-tuning with a multinomial logistic loss regularized by a coarse category consistency term. In addition, conditional executions of fine category classifiers and layer parameter compression make HD-CNNs scalable for largescale visual recognition. We achieve state-of-the-art results on both CIFAR100 and large-scale ImageNet 1000-class benchmark datasets. In our experiments, we build up three different two-level HD-CNNs, and they lower the top-1 error of the standard CNNs by 2:65%, 3:1%, and 1:1%.

References

Page 1

	Year	Citations

Page 1