On Robustness of Multi-Modal Fusion—Robotics Perspective

TLDR

Robotic perception requires robust multi‑modal fusion of heterogeneous sensor streams, yet increasing data volume and dimensionality make manual design difficult, and while current research focuses on vision and audio, haptic feedback is essential for task execution. This work compares four learning‑based fusion methods on datasets containing haptic, visual, and pose signals to assess their robustness to sensory degradation, an aspect rarely examined in prior studies. The authors evaluated the methods on grasp outcome classification, texture recognition, and multi‑label haptic adjective classification, concentrating on robustness to data degradation and testing data augmentation to enhance performance.

Abstract

The efficient multi-modal fusion of data streams from different sensors is a crucial ability that a robotic perception system should exhibit to ensure robustness against disturbances. However, as the volume and dimensionality of sensory-feedback increase it might be difficult to manually design a multimodal-data fusion system that can handle heterogeneous data. Nowadays, multi-modal machine learning is an emerging field with research focused mainly on analyzing vision and audio information. Although, from the robotics perspective, haptic sensations experienced from interaction with an environment are essential to successfully execute useful tasks. In our work, we compared four learning-based multi-modal fusion methods on three publicly available datasets containing haptic signals, images, and robots’ poses. During tests, we considered three tasks involving such data, namely grasp outcome classification, texture recognition, and—most challenging—multi-label classification of haptic adjectives based on haptic and visual data. Conducted experiments were focused not only on the verification of the performance of each method but mainly on their robustness against data degradation. We focused on this aspect of multi-modal fusion, as it was rarely considered in the research papers, and such degradation of sensory feedback might occur during robot interaction with its environment. Additionally, we verified the usefulness of data augmentation to increase the robustness of the aforementioned data fusion methods.

References

Page 1

	Year	Citations

Page 1