D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features

TLDR

Robust point cloud registration relies on discriminative 3D local features, yet while learning‑based descriptors have advanced, learning of detectors—especially joint detection and description—has received little attention. This work introduces a 3D fully convolutional network that jointly learns dense detection scores and descriptor features for each point in a point cloud. The method employs a keypoint selection strategy that mitigates density variations and a self‑supervised detector loss driven by on‑the‑fly feature matching during training. It achieves state‑of‑the‑art performance on 3DMatch and KITTI, generalizes to ETH, and demonstrates that a reliable detector enables accurate, fast alignment with fewer features.

Abstract

A successful point cloud registration often lies on robust establishment of sparse matches through discriminative 3D local features. Despite the fast evolution of learning-based 3D feature descriptors, little attention has been drawn to the learning of 3D feature detectors, even less for a joint learning of the two tasks. In this paper, we leverage a 3D fully convolutional network for 3D point clouds, and propose a novel and practical learning mechanism that densely predicts both a detection score and a description feature for each 3D point. In particular, we propose a keypoint selection strategy that overcomes the inherent density variations of 3D point clouds, and further propose a self-supervised detector loss guided by the on-the-fly feature matching results during training. Finally, our method achieves state-of-the-art results in both indoor and outdoor scenarios, evaluated on 3DMatch and KITTI datasets, and shows its strong generalization ability on the ETH dataset. Towards practical use, we show that by adopting a reliable feature detector, sampling a smaller number of features is sufficient to achieve accurate and fast point cloud alignment.

References

Page 1

	Year	Citations

Page 1