Learning Deep Features for Scene Recognition using Places Database

TLDR

Scene recognition is a fundamental computer‑vision task that defines context for object recognition, yet it has not benefited from the same advances as object recognition because ImageNet‑trained deep features are not competitive enough for scene‑centric tasks. The authors introduce Places, a new scene‑centric database containing over 7 million labeled images. They propose methods to compare dataset density and diversity, and train convolutional neural networks on Places to learn deep scene features. These learned features achieve state‑of‑the‑art performance on several scene‑centric datasets, and visualizations of CNN layer responses reveal distinct internal representations for object‑centric versus scene‑centric networks.

Abstract

Scene recognition is one of the hallmark tasks of computer vision, allowing definition of a context for object recognition. Whereas the tremendous recent progress in object recognition tasks is due to the availability of large datasets like ImageNet and the rise of Convolutional Neural Networks (CNNs) for learning high-level features, performance at scene recognition has not attained the same level of success. This may be because current deep features trained from ImageNet are not competitive enough for such tasks. Here, we introduce a new scene-centric database called Places with over 7 million labeled pictures of scenes. We propose new methods to compare the density and diversity of image datasets and show that Places is as dense as other scene datasets and has more diversity. Using CNN, we learn deep features for scene recognition tasks, and establish new state-of-the-art results on several scene-centric datasets. A visualization of the CNN layers' responses allows us to show differences in the internal representations of object-centric and scene-centric networks.

References

Page 1

	Year	Citations

Page 1