Improving Real-Time Pedestrian Detectors with RGB+Depth Fusion

Abstract

In this paper we investigate the benefit of using depth information on top of normal RGB for camera-based pedestrian detection. Indeed, depth sensing is easily acquired using depth cameras such as a Kinect or stereo setups. We investigate the best way to perform this sensor fusion with a special focus on lightweight single-pass CNN architectures, enabling real-time processing on limited hardware. We implement different network architectures, each fusing depth at different layers of our network. Our experiments show that midway fusion performs the best, outperforming a regular RGB detector substantially in accuracy. Moreover, we prove that our fusion network is better at detecting individuals in a crowd, by demonstrating that it has both a better localization of pedestrians and is better at handling occluded persons. The resulting network is computationally efficient and achieves real-time performance on both desktop and embedded GPUs.

References

Page 1

	Year	Citations

Page 1