Table of Contents
Fetching ...

CoVAE: correlated multimodal generative modeling

Federico Caretti, Guido Sanguinetti

TL;DR

This work introduces Correlated Variational Autoencoders (CoVAE), a new generative architecture that captures the correlations between modalities and test CoVAE on a number of real and synthetic data sets demonstrating both accurate cross-modal reconstruction and effective quantification of the associated uncertainties.

Abstract

Multimodal Variational Autoencoders have emerged as a popular tool to extract effective representations from rich multimodal data. However, such models rely on fusion strategies in latent space that destroy the joint statistical structure of the multimodal data, with profound implications for generation and uncertainty quantification. In this work, we introduce Correlated Variational Autoencoders (CoVAE), a new generative architecture that captures the correlations between modalities. We test CoVAE on a number of real and synthetic data sets demonstrating both accurate cross-modal reconstruction and effective quantification of the associated uncertainties.

CoVAE: correlated multimodal generative modeling

TL;DR

This work introduces Correlated Variational Autoencoders (CoVAE), a new generative architecture that captures the correlations between modalities and test CoVAE on a number of real and synthetic data sets demonstrating both accurate cross-modal reconstruction and effective quantification of the associated uncertainties.

Abstract

Multimodal Variational Autoencoders have emerged as a popular tool to extract effective representations from rich multimodal data. However, such models rely on fusion strategies in latent space that destroy the joint statistical structure of the multimodal data, with profound implications for generation and uncertainty quantification. In this work, we introduce Correlated Variational Autoencoders (CoVAE), a new generative architecture that captures the correlations between modalities. We test CoVAE on a number of real and synthetic data sets demonstrating both accurate cross-modal reconstruction and effective quantification of the associated uncertainties.
Paper Structure (35 sections, 17 equations, 17 figures, 3 tables, 1 algorithm)

This paper contains 35 sections, 17 equations, 17 figures, 3 tables, 1 algorithm.

Figures (17)

  • Figure 1: When only one modality is available, common methods, such as Product-of-Expertswu2018multimodal (left) erroneusly assign the same uncertainty to both modalities. On the contrary, CoVAE (right) correctly assigns a wider posterior to the missing modality. In this case, the two modalities have a linear correlation coefficient $\rho=0.5$
  • Figure 2: Graphical representation of CoVAE: in the case of two modalities, during training
  • Figure 3: Schematic process of the data generation, assuming classes coming from MNIST
  • Figure 4: Examples of synthetic datasets in which both modalities are from the MNIST dataset and latent dimensions $D_1=D_2=10$. From top to bottom, $\rho=0.99, 0.7, 0.05$
  • Figure 5: Linear correlations measured at the level of the input to the decoders, both in the case of joint and conditional reconstructions. The intervals around the measured values represent $\pm1\sigma$
  • ...and 12 more figures