Table of Contents
Fetching ...

Improving Variational Autoencoder Estimation from Incomplete Data with Mixture Variational Families

Vaidotas Simkus, Michael U. Gutmann

TL;DR

This work analyzes training VAEs with incomplete data and shows that missingness inflates the latent posterior complexity, potentially biasing estimates if the variational family is insufficiently flexible. It introduces two families of variational mixtures: finite mixtures (MissVAE/MissSVAE/MissIWAE/MissSIWAE) and a decomposed imputation-based approach (DeMissVAE), the latter separating data imputation from model learning via an imputation distribution f^t(x_mis|x_obs). The paper derives objective bounds for both approaches, including CVI-based and marginalised bounds, and provides practical guidance for optimization with mixture components, including implicit reparameterisation and stratified sampling. Empirical results on synthetic MoG data, UCI datasets, and MNIST/Omniglot demonstrate that variational mixtures can improve VAE estimation under missing data, with performance depending on dataset and budget, and show that the decomposed method can yield well-structured latent spaces similar to fully observed data. Overall, the work advances robust VAE estimation under incomplete data by leveraging flexible variational mixtures and data-imputation strategies with clear theoretical and empirical support.

Abstract

We consider the task of estimating variational autoencoders (VAEs) when the training data is incomplete. We show that missing data increases the complexity of the model's posterior distribution over the latent variables compared to the fully-observed case. The increased complexity may adversely affect the fit of the model due to a mismatch between the variational and model posterior distributions. We introduce two strategies based on (i) finite variational-mixture and (ii) imputation-based variational-mixture distributions to address the increased posterior complexity. Through a comprehensive evaluation of the proposed approaches, we show that variational mixtures are effective at improving the accuracy of VAE estimation from incomplete data.

Improving Variational Autoencoder Estimation from Incomplete Data with Mixture Variational Families

TL;DR

This work analyzes training VAEs with incomplete data and shows that missingness inflates the latent posterior complexity, potentially biasing estimates if the variational family is insufficiently flexible. It introduces two families of variational mixtures: finite mixtures (MissVAE/MissSVAE/MissIWAE/MissSIWAE) and a decomposed imputation-based approach (DeMissVAE), the latter separating data imputation from model learning via an imputation distribution f^t(x_mis|x_obs). The paper derives objective bounds for both approaches, including CVI-based and marginalised bounds, and provides practical guidance for optimization with mixture components, including implicit reparameterisation and stratified sampling. Empirical results on synthetic MoG data, UCI datasets, and MNIST/Omniglot demonstrate that variational mixtures can improve VAE estimation under missing data, with performance depending on dataset and budget, and show that the decomposed method can yield well-structured latent spaces similar to fully observed data. Overall, the work advances robust VAE estimation under incomplete data by leveraging flexible variational mixtures and data-imputation strategies with clear theoretical and empirical support.

Abstract

We consider the task of estimating variational autoencoders (VAEs) when the training data is incomplete. We show that missing data increases the complexity of the model's posterior distribution over the latent variables compared to the fully-observed case. The increased complexity may adversely affect the fit of the model due to a mismatch between the variational and model posterior distributions. We introduce two strategies based on (i) finite variational-mixture and (ii) imputation-based variational-mixture distributions to address the increased posterior complexity. Through a comprehensive evaluation of the proposed approaches, we show that variational mixtures are effective at improving the accuracy of VAE estimation from incomplete data.
Paper Structure (36 sections, 15 equations, 13 figures, 3 tables, 1 algorithm)

This paper contains 36 sections, 15 equations, 13 figures, 3 tables, 1 algorithm.

Figures (13)

  • Figure 1: Illustration of the posterior complexity due to missing data. Each colour represents a different data-point ${\bm{x}}^i$. First: the model posterior ${p_{\bm{\theta}}}({\bm{z}} \mid {\bm{x}})$ under complete data ${\bm{x}}$. Second: the model posterior ${p_{\bm{\theta}}}({\bm{z}} \mid {\bm{x}}_\mathrm{obs})$ under incomplete data ${\bm{x}}_\mathrm{obs}$ . Third: variational approximation ${q_{\bm{\phi}}^{{}}}({\bm{z}} \mid {\bm{x}})$ of the complete-data posterior ${p_{\bm{\theta}}}({\bm{z}} \mid {\bm{x}})$. Fourth: an imputation-mixture variational approximation $\mathbb{E}_{{p_{\bm{\theta}}}({\bm{x}}_\mathrm{mis} \mid {\bm{x}}_\mathrm{obs})} [ {q_{\bm{\phi}}^{{}}}({\bm{z}} \mid {\bm{x}}_\mathrm{obs}, {\bm{x}}_\mathrm{mis}) ]$ of the incomplete posterior ${p_{\bm{\theta}}}({\bm{z}} \mid {\bm{x}}_\mathrm{obs})$. In these figures, we use a VAE with Gaussian variational, prior, and decoder distributions fitted on complete data, then the incomplete data-points ${\bm{x}}_\mathrm{obs}$ are obtained by randomly masking 50% of the values from the complete data-points ${\bm{x}}$.
  • Figure 2: Log-likelihood on held out data evaluated by numerically integrating the 2D latent variables. VAEs were fitted on mixture-of-Gaussians data with 50% missingness. Each model is fitted with a computational budget of 5/15/25 samples from the variational distribution. The box plots show 1st and 3rd quartiles, the black lines are the medians, the dashed lines are the means, and the whiskers show the data range over 5 independent runs. MVAE and MIWAE ($\dagger$) are baseline methods by matteiMIWAEDeepGenerative2019. The other five methods are proposed in this paper.
  • Figure 3: Estimate of the test log-likelihood using the IWELBO with $I=50000$, on four UCI data sets. Each data set was rendered incomplete by applying uniform missingness of 20/50/80%. The curves show average performance over 5 independent runs of the algorithms and the intervals show the 90% centered interval.
  • Figure 4: Estimate of the test log-likelihood using the IWELBO with $I=1000$, MNIST and Omniglot data sets. Each image in the training data set was missing 2 out of 4 random quadrants. The box plots show 1st and 3rd quartiles, the black lines are the medians, the dashed lines are the means, and the whiskers show the data range over 5 independent runs.
  • Figure 5: A control study on a VAE model with 2D latent space (see additional details in \ref{['apx:exp-details-mog-2d-vae']}), examining the sensitivity of the proposed method (DeMissVAE, green) and two control methods (blue and yellow) to the accuracy of the imputation distribution ${f^{t}({\bm{x}}_\mathrm{mis} \mid {\bm{x}}_\mathrm{obs})}$. Left: ${f^{t}({\bm{x}}_\mathrm{mis} \mid {\bm{x}}_\mathrm{obs})} = {p_{\bm{\theta}}}({\bm{x}}_\mathrm{mis} \mid {\bm{x}}_\mathrm{obs})$ represented using rejection sampling. Center: an oracle imputation function that gets progressively "wider" from left-to-right of the figure. Right: an oracle imputation distribution that towards the right of the figure more significantly oversamples low-probability posterior modes. The log-likelihood is computed on a held-out test data set by numerically integrating the 2D latent space of the VAE. The horizontal axis on the two right-most figures shows the Jensen--Shannon divergence between the imputation distribution and the ground-truth conditional ${p^*}({\bm{x}}_\mathrm{mis} \mid {\bm{x}}_\mathrm{obs})$.
  • ...and 8 more figures