Publication | Closed Access
X<sup>3</sup>KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection
34
Citations
44
References
2023
Year
Unknown Venue
EngineeringMachine Learning3D Computer VisionImage AnalysisData SciencePattern RecognitionImage-based ModelingCross-task DistillationComputational ImagingMachine VisionGeometric Feature ModelingObject DetectionMulti-camera 3DComputer ScienceDeep Learning3D Object RecognitionComputer Vision3D VisionKnowledge DistillationObject RecognitionScene Modeling
Recent advances in 3D object detection (3DOD) have obtained remarkably strong results for LiDAR-based models. In contrast, surround-view 3DOD models based on multiple camera images underperform due to the necessary view transformation of features from perspective view (PV) to a 3D world representation which is ambiguous due to missing depth information. This paper introduces X <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</sup> KD, a comprehensive knowledge distillation framework across different modalities, tasks, and stages for multi-camera 3DOD. Specifically, we propose cross-task distillation from an instance segmentation teacher (X-IS) in the PV feature extraction stage providing supervision without ambiguous error backpropagation through the view transformation. After the transformation, we apply cross-modal feature distillation (X-FD) and adversarial training (X-AT) to improve the 3D world representation of multi-camera features through the information contained in a LiDAR-based 3DOD teacher. Finally, we also employ this teacher for cross-modal output distillation (X-OD), providing dense supervision at the prediction stage. We perform extensive ablations of knowledge distillation at different stages of multi-camera 3DOD. Our final X <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</sup> KD model outperforms previous state-of-the-art approaches on the nuScenes and Waymo datasets and generalizes to RADAR-based 3DOD. Qualitative results video at https://youtu.be/1do9DPFmr38.
| Year | Citations | |
|---|---|---|
Page 1
Page 1