Publication | Closed Access
Cross Modal Transformer: Towards Fast and Robust 3D Object Detection
128
Citations
40
References
2023
Year
Unknown Venue
Geometric Modeling3D Computer VisionMachine VisionImage AnalysisEngineeringExplicit View TransformationPattern RecognitionObject DetectionRobust 3DNatural Sciences3D VisionPoint Cloud ProcessingMulti-view GeometryDeep LearningComputational GeometryCross Modal Transformer3D Object RecognitionComputer Vision
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. It achieves 74.1% NDS (state-of-the-art with single model) on nuScenes test set while maintaining faster inference speed. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code is released at https://github.com/junjie18/CMT.
| Year | Citations | |
|---|---|---|
Page 1
Page 1