SUN RGB-D: A RGB-D scene understanding benchmark suite

TLDR

RGB‑D sensors have enabled breakthroughs in tasks such as 3D reconstruction, yet high‑level scene understanding remains less successful, largely because no large‑scale benchmark with 3D annotations and metrics exists. The paper introduces an RGB‑D benchmark suite designed to advance state‑of‑the‑art performance across major scene‑understanding tasks. The suite contains 10,335 RGB‑D images from four sensors, densely annotated with 146,617 2D polygons, 64,595 3D bounding boxes with accurate orientations, plus 3D room layouts and scene categories, at a scale comparable to PASCAL VOC. With this dataset, algorithms can be trained on abundant data, evaluated using meaningful 3D metrics, avoid overfitting to small test sets, and study cross‑sensor bias.

Abstract

Although RGB-D sensors have enabled major break-throughs for several vision tasks, such as 3D reconstruction, we have not attained the same level of success in high-level scene understanding. Perhaps one of the main reasons is the lack of a large-scale benchmark with 3D annotations and 3D evaluation metrics. In this paper, we introduce an RGB-D benchmark suite for the goal of advancing the state-of-the-arts in all major scene understanding tasks. Our dataset is captured by four different sensors and contains 10,335 RGB-D images, at a similar scale as PASCAL VOC. The whole dataset is densely annotated and includes 146,617 2D polygons and 64,595 3D bounding boxes with accurate object orientations, as well as a 3D room layout and scene category for each image. This dataset enables us to train data-hungry algorithms for scene-understanding tasks, evaluate them using meaningful 3D metrics, avoid overfitting to a small testing set, and study cross-sensor bias.

References

Page 1

	Year	Citations

Page 1