Table of Contents
Fetching ...

JEMA: A Joint Embedding Framework for Scalable Co-Learning with Multimodal Alignment

Joao Sousa, Roya Darabi, Armando Sousa, Frank Brueckner, Luís Paulo Reis, Ana Reis

TL;DR

This work introduces JEMA (Joint Embedding with Multimodal Alignment with Multimodal Alignment), a novel co-learning framework tailored for laser metal deposition (LMD), a pivotal process in metal additive manufacturing.

Abstract

This work introduces JEMA (Joint Embedding with Multimodal Alignment), a novel co-learning framework tailored for laser metal deposition (LMD), a pivotal process in metal additive manufacturing. As Industry 5.0 gains traction in industrial applications, efficient process monitoring becomes increasingly crucial. However, limited data and the opaque nature of AI present challenges for its application in an industrial setting. JEMA addresses this challenges by leveraging multimodal data, including multi-view images and metadata such as process parameters, to learn transferable semantic representations. By applying a supervised contrastive loss function, JEMA enables robust learning and subsequent process monitoring using only the primary modality, simplifying hardware requirements and computational overhead. We investigate the effectiveness of JEMA in LMD process monitoring, focusing specifically on its generalization to downstream tasks such as melt pool geometry prediction, achieved without extensive fine-tuning. Our empirical evaluation demonstrates the high scalability and performance of JEMA, particularly when combined with Vision Transformer models. We report an 8% increase in performance in multimodal settings and a 1% improvement in unimodal settings compared to supervised contrastive learning. Additionally, the learned embedding representation enables the prediction of metadata, enhancing interpretability and making possible the assessment of the added metadata's contributions. Our framework lays the foundation for integrating multisensor data with metadata, enabling diverse downstream tasks within the LMD domain and beyond.

JEMA: A Joint Embedding Framework for Scalable Co-Learning with Multimodal Alignment

TL;DR

This work introduces JEMA (Joint Embedding with Multimodal Alignment with Multimodal Alignment), a novel co-learning framework tailored for laser metal deposition (LMD), a pivotal process in metal additive manufacturing.

Abstract

This work introduces JEMA (Joint Embedding with Multimodal Alignment), a novel co-learning framework tailored for laser metal deposition (LMD), a pivotal process in metal additive manufacturing. As Industry 5.0 gains traction in industrial applications, efficient process monitoring becomes increasingly crucial. However, limited data and the opaque nature of AI present challenges for its application in an industrial setting. JEMA addresses this challenges by leveraging multimodal data, including multi-view images and metadata such as process parameters, to learn transferable semantic representations. By applying a supervised contrastive loss function, JEMA enables robust learning and subsequent process monitoring using only the primary modality, simplifying hardware requirements and computational overhead. We investigate the effectiveness of JEMA in LMD process monitoring, focusing specifically on its generalization to downstream tasks such as melt pool geometry prediction, achieved without extensive fine-tuning. Our empirical evaluation demonstrates the high scalability and performance of JEMA, particularly when combined with Vision Transformer models. We report an 8% increase in performance in multimodal settings and a 1% improvement in unimodal settings compared to supervised contrastive learning. Additionally, the learned embedding representation enables the prediction of metadata, enhancing interpretability and making possible the assessment of the added metadata's contributions. Our framework lays the foundation for integrating multisensor data with metadata, enabling diverse downstream tasks within the LMD domain and beyond.

Paper Structure

This paper contains 26 sections, 15 equations, 14 figures, 1 table.

Figures (14)

  • Figure 1: Diagram of a Laser Metal Deposition (LMD) system with an on-axis and off-axis monitoring solution. It shows the influence of process parameters for multi-modality representation for monitoring LMD.
  • Figure 2: Integration of multiple heterogeneous data sources, such as PLC and image-based data, with ROS 2.
  • Figure 3: Images from the same timestamp: (a) on-axis camera with iso-thermal line; (b) definition of melt pool Lenght (L) and Height (H); and (c) off-axis camera.
  • Figure 4: Melt pool lengths (L) and heights (H) in pixels for all design of experiments are presented in (a), while the mean values for each parameter set are shown in (b).
  • Figure 5: Diagram of the components for laser metal deposition monitoring in a multimodal setting. Where $x_{a,b}$ are the multimodal data and $u_{a,b}$ the process parameters that can be use to build a latent representation $S_{a, b}$.
  • ...and 9 more figures