Publication | Open Access
On Robustness of Multi-Modal Fusion—Robotics Perspective
33
Citations
31
References
2020
Year
Robot KinematicsRobotic SystemsEngineeringMachine LearningMultimodal LearningMulti-modal Fusion—robotics PerspectiveImage AnalysisData SciencePattern RecognitionFusion LearningHaptic SensationsMultimodal Sensor FusionSystems EngineeringMultimodal ProcessingHaptic AdjectivesRobot LearningRobotics PerceptionMultimodal PerceptionMachine VisionMultimodal Signal ProcessingFeature FusionComputer VisionMultimodal SensingMultimodal ImagingRoboticsData StreamsMultimodal Analytics
Robotic perception requires robust multi‑modal fusion of heterogeneous sensor streams, yet increasing data volume and dimensionality make manual design difficult, and while current research focuses on vision and audio, haptic feedback is essential for task execution. This work compares four learning‑based fusion methods on datasets containing haptic, visual, and pose signals to assess their robustness to sensory degradation, an aspect rarely examined in prior studies. The authors evaluated the methods on grasp outcome classification, texture recognition, and multi‑label haptic adjective classification, concentrating on robustness to data degradation and testing data augmentation to enhance performance.
The efficient multi-modal fusion of data streams from different sensors is a crucial ability that a robotic perception system should exhibit to ensure robustness against disturbances. However, as the volume and dimensionality of sensory-feedback increase it might be difficult to manually design a multimodal-data fusion system that can handle heterogeneous data. Nowadays, multi-modal machine learning is an emerging field with research focused mainly on analyzing vision and audio information. Although, from the robotics perspective, haptic sensations experienced from interaction with an environment are essential to successfully execute useful tasks. In our work, we compared four learning-based multi-modal fusion methods on three publicly available datasets containing haptic signals, images, and robots’ poses. During tests, we considered three tasks involving such data, namely grasp outcome classification, texture recognition, and—most challenging—multi-label classification of haptic adjectives based on haptic and visual data. Conducted experiments were focused not only on the verification of the performance of each method but mainly on their robustness against data degradation. We focused on this aspect of multi-modal fusion, as it was rarely considered in the research papers, and such degradation of sensory feedback might occur during robot interaction with its environment. Additionally, we verified the usefulness of data augmentation to increase the robustness of the aforementioned data fusion methods.
| Year | Citations | |
|---|---|---|
Page 1
Page 1