VINet: Visual-Inertial Odometry as a Sequence-to-Sequence Learning Problem

TLDR

It is the first end‑to‑end trainable visual‑inertial odometry method that fuses data at an intermediate feature‑representation level. The paper presents an on‑manifold sequence‑to‑sequence learning approach for motion estimation using visual and inertial sensors. This approach models motion estimation as a sequence‑to‑sequence problem on the manifold of poses, learning to fuse visual and inertial inputs directly. The method eliminates the need for manual synchronization and calibration, incorporates domain knowledge to reduce drift, and matches or outperforms state‑of‑the‑art techniques, especially when calibration or synchronization errors are present.

Abstract

In this paper we present an on-manifold sequence-to-sequence learning approach to motion estimation using visual and inertial sensors. It is to the best of our knowledge the first end-to-end trainable method for visual-inertial odometry which performs fusion of the data at an intermediate feature-representation level. Our method has numerous advantages over traditional approaches. Specifically, it eliminates the need for tedious manual synchronization of the camera and IMU as well as eliminating the need for manual calibration between the IMU and camera. A further advantage is that our model naturally and elegantly incorporates domain specific information which significantly mitigates drift. We show that our approach is competitive with state-of-the-art traditional methods when accurate calibration data is available and can be trained to outperform them in the presence of calibration and synchronization errors.

References

Page 1

	Year	Citations

Page 1