Towards Robust Human-Robot Collaborative Manufacturing: Multimodal Fusion

TLDR

Multimodal robot control is essential for intuitive human–robot collaboration in manufacturing, yet existing methods are unreliable because they lack shared feature representations across modalities. The study proposes a deep learning‑based multimodal fusion architecture to achieve robust multimodal human–robot collaboration in manufacturing. The architecture trains separate unimodal models for speech, hand, and body motion, then fuses their extracted features to share representations across modalities. Experiments demonstrate that the multimodal fusion model outperforms each unimodal model and suggests strong potential for robust human–robot collaboration in manufacturing.

Abstract

Intuitive and robust multimodal robot control is the key toward human–robot collaboration (HRC) for manufacturing systems. Multimodal robot control methods were introduced in previous studies. The methods allow human operators to control robot intuitively without programming brand-specific code. However, most of the multimodal robot control methods are unreliable because the feature representations are not shared across multiple modalities. To target this problem, a deep learning-based multimodal fusion architecture is proposed in this paper for robust multimodal HRC manufacturing systems. The proposed architecture consists of three modalities: speech command, hand motion, and body motion. Three unimodal models are first trained to extract features, which are further fused for representation sharing. Experiments show that the proposed multimodal fusion model outperforms the three unimodal models. This paper indicates a great potential to apply the proposed multimodal fusion architecture to robust HRC manufacturing systems.

References

Page 1

	Year	Citations

Page 1