Unsupervised multimodal fusion of in-process sensor data for advanced manufacturing process monitoring

Abstract

Effective monitoring of manufacturing processes is crucial for maintaining product quality and operational efficiency. Modern manufacturing environments often generate vast amounts of complementary multimodal data, including visual imagery from various perspectives and resolutions, hyperspectral data, and machine health monitoring information such as actuator positions, accelerometer readings, and temperature measurements. However, fusing and interpreting this complex, high-dimensional data presents significant challenges, particularly when labeled datasets are unavailable or impractical to obtain. This paper presents a novel approach to multimodal sensor data fusion in manufacturing processes, inspired by the Contrastive Language-Image Pre-training (CLIP) model. We leverage contrastive learning techniques to correlate different data modalities without the need for labeled data, overcoming limitations of traditional supervised machine learning methods in manufacturing contexts. Our proposed method demonstrates the ability to handle and learn encoders for five distinct modalities: visual imagery, audio signals, laser position (x and y coordinates), and laser power measurements. By compressing these high-dimensional datasets into low-dimensional representational spaces, our approach facilitates downstream tasks such as process control, anomaly detection, and quality assurance. The unsupervised nature of our method makes it broadly applicable across various manufacturing domains, where large volumes of unlabeled sensor data are common. We evaluate the effectiveness of our approach through a series of experiments, demonstrating its potential to enhance process monitoring capabilities in advanced manufacturing systems. This research contributes to the field of smart manufacturing by providing a flexible, scalable framework for multimodal data fusion that can adapt to diverse manufacturing environments and sensor configurations. The proposed method paves the way for more robust, data-driven decision-making in complex manufacturing processes. (left) We use contrastive loss to train encoders for each modality. Contrastive loss pushes corresponding vectors closer together in latent space (The blue diagonal shows I i T j , where i = j ) while dissimilar vectors are pushed apart ( I i T j where i ≠ j ). (right) We use the encoders for inference over the data to identify clusters and anomalies. The red and blue dots on the 2D scatter plot are data tuples from a nominal print (in red) and a purposefully off-nominal print (blue). Each dot represents an individual part for a unique layer, and each group of red and blue circles represents a distinct part on the build plates. There were nine parts built. The red (nominal) and blue (off-nominal) dots are generally discriminated from one another. • A novel unsupervised multimodal data fusion approach for manufacturing process monitoring is presented. • Contrastive learning techniques correlate diverse sensor data without requiring labeled datasets. • High-dimensional manufacturing data is compressed into low-dimensional representational spaces. • This innovation facilitates data-driven decision-making for improved quality control and operational efficiency.

References

Page 1

	Year	Citations

Page 1