Publication | Closed Access
LIFT: Learning 4D LiDAR Image Fusion Transformer for 3D Object Detection
34
Citations
31
References
2022
Year
EngineeringMachine LearningChallenging NuscenesPoint Cloud Processing3D Computer VisionImage AnalysisData SciencePattern RecognitionMultimodal Sensor FusionRobot LearningSensor FusionMachine VisionObject DetectionComputer ScienceDeep Learning3D Object RecognitionComputer Vision3D VisionSequential Cross-sensor Data
LiDAR and camera are two common sensors to collect data in time for 3D object detection under the autonomous driving context. Though the complementary information across sensors and time has great potential of benefiting 3D perception, taking full advantage of sequential cross-sensor data still remains challenging. In this paper, we propose a novel LiDAR Image Fusion Transformer (LIFT) to model the mutual interaction relationship of cross-sensor data over time. LIFT learns to align the input 4D sequential cross-sensor data to achieve multi-frame multi-modal information aggregation. To alleviate computational load, we project both point clouds and images into the bird-eye-view maps to compute sparse grid-wise self-attention. LIFT also benefits from a cross-sensor and cross-time data augmentation scheme. We evaluate the proposed approach on the challenging nuScenes and Waymo datasets, where our LIFT performs well over the state-of-the-art and strong baselines.
| Year | Citations | |
|---|---|---|
Page 1
Page 1