Robust Multi-Modality Multi-Object Tracking

TLDR

Multi‑sensor perception is essential for autonomous driving, yet existing multi‑sensor multi‑object tracking methods either rely too heavily on a single source or fuse sensor outputs only in post‑processing, compromising reliability and accuracy. This study introduces a generic, sensor‑agnostic multi‑modality MOT framework (mmMOT) that lets each modality operate independently for reliability while enhancing accuracy through a novel fusion module. mmMOT is trained end‑to‑end, jointly optimizing each modality’s feature extractor and a cross‑modality adjacency estimator, and it uniquely incorporates deep point‑cloud representations into the data‑association process. Experiments on the KITTI benchmark demonstrate that mmMOT achieves state‑of‑the‑art performance. Code and models are available at https://github.com/ZwwWayne/mmMOT.

Abstract

Multi-sensor perception is crucial to ensure the reliability and accuracy in autonomous driving system, while multi-object tracking (MOT) improves that by tracing sequential movement of dynamic objects. Most current approaches for multi-sensor multi-object tracking are either lack of reliability by tightly relying on a single input source (e.g., center camera), or not accurate enough by fusing the results from multiple sensors in post processing without fully exploiting the inherent information. In this study, we design a generic sensor-agnostic multi-modality MOT framework (mmMOT), where each modality (i.e., sensors) is capable of performing its role independently to preserve reliability, and could further improving its accuracy through a novel multi-modality fusion module. Our mmMOT can be trained in an end-to-end manner, enables joint optimization for the base feature extractor of each modality and an adjacency estimator for cross modality. Our mmMOT also makes the first attempt to encode deep representation of point cloud in data association process in MOT. We conduct extensive experiments to evaluate the effectiveness of the proposed framework on the challenging KITTI benchmark and report state-of-the-art performance. Code and models are available at https://github.com/ZwwWayne/mmMOT.

References

Page 1

	Year	Citations

Page 1