CoVAE: correlated multimodal generative modeling

Federico Caretti; Guido Sanguinetti

CoVAE: correlated multimodal generative modeling

Federico Caretti, Guido Sanguinetti

TL;DR

This work introduces Correlated Variational Autoencoders (CoVAE), a new generative architecture that captures the correlations between modalities and test CoVAE on a number of real and synthetic data sets demonstrating both accurate cross-modal reconstruction and effective quantification of the associated uncertainties.

Abstract

Multimodal Variational Autoencoders have emerged as a popular tool to extract effective representations from rich multimodal data. However, such models rely on fusion strategies in latent space that destroy the joint statistical structure of the multimodal data, with profound implications for generation and uncertainty quantification. In this work, we introduce Correlated Variational Autoencoders (CoVAE), a new generative architecture that captures the correlations between modalities. We test CoVAE on a number of real and synthetic data sets demonstrating both accurate cross-modal reconstruction and effective quantification of the associated uncertainties.

CoVAE: correlated multimodal generative modeling

TL;DR

Abstract

Paper Structure (35 sections, 17 equations, 17 figures, 3 tables, 1 algorithm)

This paper contains 35 sections, 17 equations, 17 figures, 3 tables, 1 algorithm.

Introduction
Related work
Methods
Variational Autoencoders and their multimodal extensions
Correlated VAEs (CoVAE)
CoVAE architecture
Training CoVAEs
Generating from CoVAE
Experiments
Synthetic datasets
Results
Biomedical dataset
Setup description
Results
Discussion
...and 20 more sections

Figures (17)

Figure 1: When only one modality is available, common methods, such as Product-of-Expertswu2018multimodal (left) erroneusly assign the same uncertainty to both modalities. On the contrary, CoVAE (right) correctly assigns a wider posterior to the missing modality. In this case, the two modalities have a linear correlation coefficient $\rho=0.5$
Figure 2: Graphical representation of CoVAE: in the case of two modalities, during training
Figure 3: Schematic process of the data generation, assuming classes coming from MNIST
Figure 4: Examples of synthetic datasets in which both modalities are from the MNIST dataset and latent dimensions $D_1=D_2=10$. From top to bottom, $\rho=0.99, 0.7, 0.05$
Figure 5: Linear correlations measured at the level of the input to the decoders, both in the case of joint and conditional reconstructions. The intervals around the measured values represent $\pm1\sigma$
...and 12 more figures

CoVAE: correlated multimodal generative modeling

TL;DR

Abstract

CoVAE: correlated multimodal generative modeling

Authors

TL;DR

Abstract

Table of Contents

Figures (17)