Table of Contents
Fetching ...

Multimodal hierarchical Variational AutoEncoders with Factor Analysis latent space

Alejandro Guerrero-López, Carlos Sevilla-Salcedo, Vanessa Gómez-Verdejo, Pablo M. Olmos

TL;DR

By overcoming the limitations of existing methods, the FA-VAE provides a more interpretable, flexible, and modular solution for managing heterogeneous data types and offers a pathway to more efficient and scalable data-handling strategies.

Abstract

Purpose: Handling heterogeneous and mixed data types has become increasingly critical with the exponential growth in real-world databases. While deep generative models attempt to merge diverse data views into a common latent space, they often sacrifice interpretability, flexibility, and modularity. This study proposes a novel method to address these limitations by combining Variational AutoEncoders (VAEs) with a Factor Analysis latent space (FA-VAE). Methods: The proposed FA-VAE method employs multiple VAEs to learn a private representation for each heterogeneous data view in a continuous latent space. Information is shared between views using a low-dimensional latent space, generated via a linear projection matrix. This modular design creates a hierarchical dependency between private and shared latent spaces, allowing for the flexible addition of new views and conditioning of pre-trained models. Results: The FA-VAE approach facilitates cross-generation of data from different domains and enables transfer learning between generative models. This allows for effective integration of information across diverse data views while preserving their distinct characteristics. Conclusions: By overcoming the limitations of existing methods, the FA-VAE provides a more interpretable, flexible, and modular solution for managing heterogeneous data types. It offers a pathway to more efficient and scalable data-handling strategies, enhancing the potential for cross-domain data synthesis and model transferability.

Multimodal hierarchical Variational AutoEncoders with Factor Analysis latent space

TL;DR

By overcoming the limitations of existing methods, the FA-VAE provides a more interpretable, flexible, and modular solution for managing heterogeneous data types and offers a pathway to more efficient and scalable data-handling strategies.

Abstract

Purpose: Handling heterogeneous and mixed data types has become increasingly critical with the exponential growth in real-world databases. While deep generative models attempt to merge diverse data views into a common latent space, they often sacrifice interpretability, flexibility, and modularity. This study proposes a novel method to address these limitations by combining Variational AutoEncoders (VAEs) with a Factor Analysis latent space (FA-VAE). Methods: The proposed FA-VAE method employs multiple VAEs to learn a private representation for each heterogeneous data view in a continuous latent space. Information is shared between views using a low-dimensional latent space, generated via a linear projection matrix. This modular design creates a hierarchical dependency between private and shared latent spaces, allowing for the flexible addition of new views and conditioning of pre-trained models. Results: The FA-VAE approach facilitates cross-generation of data from different domains and enables transfer learning between generative models. This allows for effective integration of information across diverse data views while preserving their distinct characteristics. Conclusions: By overcoming the limitations of existing methods, the FA-VAE provides a more interpretable, flexible, and modular solution for managing heterogeneous data types. It offers a pathway to more efficient and scalable data-handling strategies, enhancing the potential for cross-domain data synthesis and model transferability.
Paper Structure (12 sections, 16 equations, 16 figures, 3 tables, 1 algorithm)

This paper contains 12 sections, 16 equations, 16 figures, 3 tables, 1 algorithm.

Figures (16)

  • Figure 1: VAE basic structure, where $q_\eta(\mathbf{z}| \mathbf{x})$ is the encoder network and $p_{\theta}(\mathbf{z}| \mathbf{x})$ is the decoder network. Grey circles denote observations, and white circles represent rv.
  • Figure 2: FA-VAE graphical model example with two VAEs. Grey circles denote observations, and white circles represent rv.
  • Figure 3: Conditioning a single VAE to a multi-label attribute vector using FA-VAE architecture where $A$ denotes attributes view and $O$ observations views. Gray circles are observations, and white circles represent rv.
  • Figure 4: VAEs convergence. Fig. \ref{['fig:escenario1_global']} shows the ELBO of the unsupervised VAE trained over CelebA from scratch. In Fig. \ref{['fig:escenario1_recloss']} we plug the unsupervised VAE from Fig. \ref{['fig:escenario1_global']} inside FA-VAE's architecture to condition it.
  • Figure 5: Different faces are generated by FA-VAE when modifying their attributes. The left column of each subfigure represents the raw image. Each subfigure's centre and right columns represent the altered images by changing the different attributes indicated in the title, meaning [smile, lipstick, gender].
  • ...and 11 more figures