Jointly Trained Variational Autoencoder for Multi-Modal Sensor Fusion

Abstract

This work presents the novel multi-modal Variational Autoencoder approach <tex>$\mathbf{M}^{\mathbf{2}}\mathbf{VAE}$</tex> which is derived from the complete marginal joint log-likelihood. This allows the end-to-end training of Bayesian information fusion on raw data for all subsets of a sensor setup. Furthermore, we introduce the concept of in-place fusion – applicable to distributed sensing - where latent embeddings of observations need to be fused with new data. To facilitate in-place fusion even on raw data, we introduced the concept of a re-encoding loss that stabilizes the decoding and makes visualization of latent statistics possible. We also show that the <tex>$\mathbf{M}^{\mathbf{2}}\mathbf{VAE}$</tex> finds a coherent latent embedding, such that a single naïve Bayes classifier performs equally well on all permutations of a bi-modal Mixture-of-Gaussians signal. Finally, we show that our approach outperforms current VAE approaches on a bi-modal MNIST & fashion-MNIST data set and works sufficiently well as a preprocessing on a tri-modal simulated camera & LiDAR data set from the Gazebo simulator.

References

Page 1

	Year	Citations

Page 1