A Markov Random Field Multi-Modal Variational AutoEncoder
Fouad Oubari, Mohamed El Baha, Raphael Meunier, Rodrigue Décatoire, Mathilde Mougeot
TL;DR
This work tackles modeling complex intermodal dependencies in multimodal data by infusing Markov Random Fields into both the prior and posterior of a multimodal variational autoencoder. It introduces a family of MRF-based VAEs, including a Gaussian MRF MVAE, an ALMRF MVAE for heavy-tailed data, and an NN-MRF MVAE with neural-network potentials, along with unified ELBO formulations and differentiable/inference schemes. Empirical results show competitive performance on PolyMNIST and superior intermodal coherence on a synthetic copula dataset, highlighting improved fidelity in joint modal generation and dependency modeling. The proposed framework advances multimodal generative modeling by enabling more faithful, tractable capture of complex cross-modal relationships with potential benefits for explainability and downstream applications.
Abstract
Recent advancements in multimodal Variational AutoEncoders (VAEs) have highlighted their potential for modeling complex data from multiple modalities. However, many existing approaches use relatively straightforward aggregating schemes that may not fully capture the complex dynamics present between different modalities. This work introduces a novel multimodal VAE that incorporates a Markov Random Field (MRF) into both the prior and posterior distributions. This integration aims to capture complex intermodal interactions more effectively. Unlike previous models, our approach is specifically designed to model and leverage the intricacies of these relationships, enabling a more faithful representation of multimodal data. Our experiments demonstrate that our model performs competitively on the standard PolyMNIST dataset and shows superior performance in managing complex intermodal dependencies in a specially designed synthetic dataset, intended to test intricate relationships.
