Table of Contents
Fetching ...

A tutorial on multi-view autoencoders using the multi-view-AE library

Ana Lawry Aguila, Andre Altmann

TL;DR

This work aims to establish a cohesive foundation for multi-modal modelling, serving as a valuable educational resource in the field and extending the documentation and functionality of the previously introduced \texttt{multi-view-AE} library.

Abstract

There has been a growing interest in recent years in modelling multiple modalities (or views) of data to for example, understand the relationship between modalities or to generate missing data. Multi-view autoencoders have gained significant traction for their adaptability and versatility in modelling multi-modal data, demonstrating an ability to tailor their approach to suit the characteristics of the data at hand. However, most multi-view autoencoders have inconsistent notation and are often implemented using different coding frameworks. To address this, we present a unified mathematical framework for multi-view autoencoders, consolidating their formulations. Moreover, we offer insights into the motivation and theoretical advantages of each model. To facilitate accessibility and practical use, we extend the documentation and functionality of the previously introduced \texttt{multi-view-AE} library. This library offers Python implementations of numerous multi-view autoencoder models, presented within a user-friendly framework. Through benchmarking experiments, we evaluate our implementations against previous ones, demonstrating comparable or superior performance. This work aims to establish a cohesive foundation for multi-modal modelling, serving as a valuable educational resource in the field.

A tutorial on multi-view autoencoders using the multi-view-AE library

TL;DR

This work aims to establish a cohesive foundation for multi-modal modelling, serving as a valuable educational resource in the field and extending the documentation and functionality of the previously introduced \texttt{multi-view-AE} library.

Abstract

There has been a growing interest in recent years in modelling multiple modalities (or views) of data to for example, understand the relationship between modalities or to generate missing data. Multi-view autoencoders have gained significant traction for their adaptability and versatility in modelling multi-modal data, demonstrating an ability to tailor their approach to suit the characteristics of the data at hand. However, most multi-view autoencoders have inconsistent notation and are often implemented using different coding frameworks. To address this, we present a unified mathematical framework for multi-view autoencoders, consolidating their formulations. Moreover, we offer insights into the motivation and theoretical advantages of each model. To facilitate accessibility and practical use, we extend the documentation and functionality of the previously introduced \texttt{multi-view-AE} library. This library offers Python implementations of numerous multi-view autoencoder models, presented within a user-friendly framework. Through benchmarking experiments, we evaluate our implementations against previous ones, demonstrating comparable or superior performance. This work aims to establish a cohesive foundation for multi-modal modelling, serving as a valuable educational resource in the field.
Paper Structure (12 sections, 39 equations, 7 figures, 2 tables)

This paper contains 12 sections, 39 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Single view autoencoder frameworks; (a) vanilla autoencoder, (b) Adversarial Autoencoder, (c) Variational Autoencoder.
  • Figure 2: Latent variable models for two input views. Latent variable model where data $\textbf{X}_1$ and $\textbf{X}_2$ (a) share an underlying latent factor $\textbf{z}$, (b) have associated latent factors $\textbf{z}_1$ and $\textbf{z}_2$ and (c) share an underlying latent factor as well as view specific private latent variables.
  • Figure 3: Example frameworks of a two-view autoencoder for data $\mathbf{X}_1$ and $\textbf{X}_2$ for (a) a joint model, where the individual latent spaces are combined and the reconstruction is carried out from the joint latent space, (b) a coordinated model, where the latent representations are coordinated by an addition loss term for association between the latent variable, and (c) a joint model with shared and private latent variables.
  • Figure 4: Schematic diagram of the multi-view-AE package.
  • Figure 5: Implemented classes grouped by broad category implemented in multi-view-AE in addition to the model classes.
  • ...and 2 more figures