Publication | Open Access
ConceptFusion: Open-set multimodal 3D mapping
168
Citations
73
References
2023
Year
Unknown Venue
3D Computer VisionNatural LanguageMachine VisionImage AnalysisData ScienceMachine LearningPixel-aligned Open-set FeaturesEngineering3D VisionScene UnderstandingComputer ScienceOpen-set Multimodal 3DRobot LearningDeep LearningComputer VisionFoundation Models
modalities such as natural language, images, and audio.We demonstrate that pixel-aligned open-set features can be fused into 3D maps via traditional SLAM and multi-view fusion approaches.This enables effective zero-shot spatial reasoning, not needing any additional training or finetuning, and retains long-tailed concepts better than supervised approaches, outperforming them by more than 40% margin on 3D IoU.We extensively evaluate ConceptFusion on a number of real-world datasets, simulated home environments, a real-world tabletop manipulation task, and an autonomous driving platform.We showcase new avenues for blending foundation models with 3D open-set multimodal mapping.We encourage the reader to view the demos on our project page: https://concept-fusion.github.io/
| Year | Citations | |
|---|---|---|
Page 1
Page 1