Table of Contents
Fetching ...

Heterogenous Multi-Source Data Fusion Through Input Mapping and Latent Variable Gaussian Process

Yigitcan Comlek, Sandipp Krishnan Ravi, Piyush Pandita, Sayan Ghosh, Liping Wang, Wei Chen

TL;DR

The paper tackles the problem of heterogeneity across data sources with distinct input parameter spaces, proposing a two-stage framework that first maps all inputs to a common reference via Input Mapping Calibration (IMC) and then fuses the sources with a Latent Variable Gaussian Process (LVGP). The IMC stage uses a linear calibration $g(x; A, b)=Ax+b$ to align sources, while LVGP learns a low-dimensional latent representation for each source, yielding a source-aware surrogate with a dissimilarity metric that reveals cross-source relationships. Across three engineering case studies—cantilever beam, ellipsoidal voids, and Ti6Al4V manufacturing—the LVGP-based multi-source fusion consistently outperforms single-source GP and source-unaware baselines, particularly for data-scarce sources, and provides interpretable latent-space structure. The framework thus enables robust, interpretable data fusion in heterogeneous input settings, with potential extensions to digital twins, multi-task, and federated learning, and future work including non-linear mappings and cost-aware adaptive sampling.

Abstract

Artificial intelligence and machine learning frameworks have served as computationally efficient mapping between inputs and outputs for engineering problems. These mappings have enabled optimization and analysis routines that have warranted superior designs, ingenious material systems and optimized manufacturing processes. A common occurrence in such modeling endeavors is the existence of multiple source of data, each differentiated by fidelity, operating conditions, experimental conditions, and more. Data fusion frameworks have opened the possibility of combining such differentiated sources into single unified models, enabling improved accuracy and knowledge transfer. However, these frameworks encounter limitations when the different sources are heterogeneous in nature, i.e., not sharing the same input parameter space. These heterogeneous input scenarios can occur when the domains differentiated by complexity, scale, and fidelity require different parametrizations. Towards addressing this void, a heterogeneous multi-source data fusion framework is proposed based on input mapping calibration (IMC) and latent variable Gaussian process (LVGP). In the first stage, the IMC algorithm is utilized to transform the heterogeneous input parameter spaces into a unified reference parameter space. In the second stage, a multi-source data fusion model enabled by LVGP is leveraged to build a single source-aware surrogate model on the transformed reference space. The proposed framework is demonstrated and analyzed on three engineering case studies (design of cantilever beam, design of ellipsoidal void and modeling properties of Ti6Al4V alloy). The results indicate that the proposed framework provides improved predictive accuracy over a single source model and transformed but source unaware model.

Heterogenous Multi-Source Data Fusion Through Input Mapping and Latent Variable Gaussian Process

TL;DR

The paper tackles the problem of heterogeneity across data sources with distinct input parameter spaces, proposing a two-stage framework that first maps all inputs to a common reference via Input Mapping Calibration (IMC) and then fuses the sources with a Latent Variable Gaussian Process (LVGP). The IMC stage uses a linear calibration to align sources, while LVGP learns a low-dimensional latent representation for each source, yielding a source-aware surrogate with a dissimilarity metric that reveals cross-source relationships. Across three engineering case studies—cantilever beam, ellipsoidal voids, and Ti6Al4V manufacturing—the LVGP-based multi-source fusion consistently outperforms single-source GP and source-unaware baselines, particularly for data-scarce sources, and provides interpretable latent-space structure. The framework thus enables robust, interpretable data fusion in heterogeneous input settings, with potential extensions to digital twins, multi-task, and federated learning, and future work including non-linear mappings and cost-aware adaptive sampling.

Abstract

Artificial intelligence and machine learning frameworks have served as computationally efficient mapping between inputs and outputs for engineering problems. These mappings have enabled optimization and analysis routines that have warranted superior designs, ingenious material systems and optimized manufacturing processes. A common occurrence in such modeling endeavors is the existence of multiple source of data, each differentiated by fidelity, operating conditions, experimental conditions, and more. Data fusion frameworks have opened the possibility of combining such differentiated sources into single unified models, enabling improved accuracy and knowledge transfer. However, these frameworks encounter limitations when the different sources are heterogeneous in nature, i.e., not sharing the same input parameter space. These heterogeneous input scenarios can occur when the domains differentiated by complexity, scale, and fidelity require different parametrizations. Towards addressing this void, a heterogeneous multi-source data fusion framework is proposed based on input mapping calibration (IMC) and latent variable Gaussian process (LVGP). In the first stage, the IMC algorithm is utilized to transform the heterogeneous input parameter spaces into a unified reference parameter space. In the second stage, a multi-source data fusion model enabled by LVGP is leveraged to build a single source-aware surrogate model on the transformed reference space. The proposed framework is demonstrated and analyzed on three engineering case studies (design of cantilever beam, design of ellipsoidal void and modeling properties of Ti6Al4V alloy). The results indicate that the proposed framework provides improved predictive accuracy over a single source model and transformed but source unaware model.
Paper Structure (25 sections, 12 equations, 9 figures, 6 tables)

This paper contains 25 sections, 12 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: The heterogeneous multi-source data fusion framework
  • Figure 2: Different cantilever beam designs (sources) with varying parametrization
  • Figure 3: Predictions based on mapped (a) Hollow Rectangular and (b) Hollow Circular beams using the reference GP model built on Rectangular Beam source
  • Figure 4: The results of the cantilever beam design study. Predictions on (a) all sources using heterogeneous multi-source GP, (b) sources using heterogeneous multi-source LVGP, (c) hollow circular beam using single-source GP built on original input space, (d) hollow circular beam using heterogeneous multi-source GP, (e) hollow circular beam using heterogeneous multi-source LVGP. (f) The latent space obtained by the LVGP model
  • Figure 5: Ellipsoidal void sources with varying design complexity (2D & 3D), and fidelity (elastic & plastic)
  • ...and 4 more figures