R-FCN: Object Detection via Region-based Fully Convolutional Networks

TLDR

Previous region‑based detectors such as Fast/Faster R‑CNN apply a costly per‑region subnetwork hundreds of times, whereas the proposed detector is fully convolutional and shares almost all computation across the image. The authors propose region‑based fully convolutional networks that use position‑sensitive score maps to reconcile translation‑invariance in classification with translation‑variance in detection, aiming for accurate and efficient object detection. The method employs fully convolutional architectures with position‑sensitive score maps and leverages ResNet backbones to perform detection across the entire image. On PASCAL VOC, the 101‑layer ResNet achieves 83.6 % mAP on the 2007 set while running at 170 ms per image, 2.5–20× faster than Faster R‑CNN. Code is publicly available at https://github.com/daijifeng001/r-fcn.

Abstract

We present region-based, fully convolutional networks for accurate and efficient object detection. In contrast to previous region-based detectors such as Fast/Faster R-CNN that apply a costly per-region subnetwork hundreds of times, our region-based detector is fully convolutional with almost all computation shared on the entire image. To achieve this goal, we propose position-sensitive score maps to address a dilemma between translation-invariance in image classification and translation-variance in object detection. Our method can thus naturally adopt fully convolutional image classifier backbones, such as the latest Residual Networks (ResNets), for object detection. We show competitive results on the PASCAL VOC datasets (e.g., 83.6% mAP on the 2007 set) with the 101-layer ResNet. Meanwhile, our result is achieved at a test-time speed of 170ms per image, 2.5-20 times faster than the Faster R-CNN counterpart. Code is made publicly available at: https://github.com/daijifeng001/r-fcn.