Table of Contents
Fetching ...

Fuse It or Lose It: Deep Fusion for Multimodal Simulation-Based Inference

Marvin Schmitt, Leona Odole, Stefan T. Radev, Paul-Christian Bürkner

TL;DR

Multodal neural posterior estimation (MultiNPE) not only outperforms single-source baselines on a reference task, but also achieves superior inference on scientific models from cognitive neuroscience and cardiology.

Abstract

We present multimodal neural posterior estimation (MultiNPE), a method to integrate heterogeneous data from different sources in simulation-based inference with neural networks. Inspired by advances in deep fusion, it allows researchers to analyze data from different domains and infer the parameters of complex mathematical models with increased accuracy. We consider three fusion approaches for MultiNPE (early, late, hybrid) and evaluate their performance in three challenging experiments. MultiNPE not only outperforms single-source baselines on a reference task, but also achieves superior inference on scientific models from cognitive neuroscience and cardiology. We systematically investigate the impact of partially missing data on the different fusion strategies. Across our experiments, late and hybrid fusion techniques emerge as the methods of choice for practical applications of multimodal simulation-based inference.

Fuse It or Lose It: Deep Fusion for Multimodal Simulation-Based Inference

TL;DR

Multodal neural posterior estimation (MultiNPE) not only outperforms single-source baselines on a reference task, but also achieves superior inference on scientific models from cognitive neuroscience and cardiology.

Abstract

We present multimodal neural posterior estimation (MultiNPE), a method to integrate heterogeneous data from different sources in simulation-based inference with neural networks. Inspired by advances in deep fusion, it allows researchers to analyze data from different domains and infer the parameters of complex mathematical models with increased accuracy. We consider three fusion approaches for MultiNPE (early, late, hybrid) and evaluate their performance in three challenging experiments. MultiNPE not only outperforms single-source baselines on a reference task, but also achieves superior inference on scientific models from cognitive neuroscience and cardiology. We systematically investigate the impact of partially missing data on the different fusion strategies. Across our experiments, late and hybrid fusion techniques emerge as the methods of choice for practical applications of multimodal simulation-based inference.
Paper Structure (21 sections, 14 equations, 10 figures, 2 tables)

This paper contains 21 sections, 14 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: We present a set of deep fusion methods to equips simulation-based inference (SBI) with the ability to integrate information from multiple heterogeneous data sources. Early fusion (into $\mathbf{X}$) uses multi-head attention with $\mathbf{X}$ as query and $\mathbf{Y}$ as both key and value, yielding the cross-informed representation $\tilde{\mathbf{X}}$, followed by another summary network $s_x$. In contrast, late fusion learns separate embeddings $s_x(\mathbf{X})$ and $s_y(\mathbf{Y})$, and then fuses the embeddings. Hybrid fusion combines both worlds by using cross-shaped multi-head attention like in early fusion, followed by separate embeddings and a late fusion step. See \ref{['sec:methods']} for a more formal specification.
  • Figure 2: Experiment 1: Simplified 2D visualization of the experimental setup. The actual experiment is implemented in 10-dimensional spaces for both the parameters $\boldsymbol{\theta}$ and the observed measurement variables $\mathbf{X}, \mathbf{Y}$.
  • Figure 3: Experiment 1: Two of our multimodal schemes (late fusion and hybrid fusion) outperform single-source architectures (only $\mathbf{X}$/$\mathbf{Y}$), as indexed by better (lower) negative log posterior on held-out data across ten repetitions with different seeds.
  • Figure 4: Experiment 2. Overview of the experimental setup. A human's neurocognitive attributes parameterize the simulation programs for centro-parietal positivity (CPP) and reaction times (DDM).
  • Figure 5: Experiment 2: Hybrid fusion and late fusion consistently show better accuracy (RMSE averaged over all parameters) than the default (direct concatenation). Recall that the training uses 10% missing data (top row, dotted line) and missingness beyond $10\%$ is a substantial extrapolation. Calibration (ECE) of the shared parameter $\mu$ does not clearly differ between the methods.
  • ...and 5 more figures