Table of Contents
Fetching ...

Disentangled Interleaving Variational Encoding

Noelle Y. L. Wong, Eng Yeow Cheu, Zhonglin Chiam, Dipti Srinivasan

TL;DR

DeepDIVE addresses gradient conflict in multi-objective learning by deriving a unified loss from the data log-likelihood, enabling non-conflicting optimization for reconstruction, forecasting, and disentanglement. It extends variational autoencoders to semi-supervised forecasting by introducing a disentangled latent space with marginal ($n_2$) and conditional ($n_1$) components and employing interleaving training with cross-attention fusion to integrate these factors. The approach leverages a naive Bayes independence assumption to derive a disentanglement loss, replaces the marginal KL term with a cross-entropy bound under a mixture-RBF prior, and proves that the overall loss decomposes without mutual gradient conflict. Empirical results on gait and electricity time-series show that DeepDIVE disentangles inputs effectively and achieves forecast accuracy better than the original VAE and comparable to state-of-the-art baselines, while providing interpretable latent representations. Overall, the work offers a principled framework for multi-task latent learning with practical impact on forecasting under uncertainty and potential extensions to anomaly detection and LLM augmentation.

Abstract

Conflicting objectives present a considerable challenge in interleaving multi-task learning, necessitating the need for meticulous design and balance to ensure effective learning of a representative latent data space across all tasks without mutual negative impact. Drawing inspiration from the concept of marginal and conditional probability distributions in probability theory, we design a principled and well-founded approach to disentangle the original input into marginal and conditional probability distributions in the latent space of a variational autoencoder. Our proposed model, Deep Disentangled Interleaving Variational Encoding (DeepDIVE) learns disentangled features from the original input to form clusters in the embedding space and unifies these features via the cross-attention mechanism in the fusion stage. We theoretically prove that combining the objectives for reconstruction and forecasting fully captures the lower bound and mathematically derive a loss function for disentanglement using Naïve Bayes. Under the assumption that the prior is a mixture of log-concave distributions, we also establish that the Kullback-Leibler divergence between the prior and the posterior is upper bounded by a function minimized by the minimizer of the cross entropy loss, informing our adoption of radial basis functions (RBF) and cross entropy with interleaving training for DeepDIVE to provide a justified basis for convergence. Experiments on two public datasets show that DeepDIVE disentangles the original input and yields forecast accuracies better than the original VAE and comparable to existing state-of-the-art baselines.

Disentangled Interleaving Variational Encoding

TL;DR

DeepDIVE addresses gradient conflict in multi-objective learning by deriving a unified loss from the data log-likelihood, enabling non-conflicting optimization for reconstruction, forecasting, and disentanglement. It extends variational autoencoders to semi-supervised forecasting by introducing a disentangled latent space with marginal () and conditional () components and employing interleaving training with cross-attention fusion to integrate these factors. The approach leverages a naive Bayes independence assumption to derive a disentanglement loss, replaces the marginal KL term with a cross-entropy bound under a mixture-RBF prior, and proves that the overall loss decomposes without mutual gradient conflict. Empirical results on gait and electricity time-series show that DeepDIVE disentangles inputs effectively and achieves forecast accuracy better than the original VAE and comparable to state-of-the-art baselines, while providing interpretable latent representations. Overall, the work offers a principled framework for multi-task latent learning with practical impact on forecasting under uncertainty and potential extensions to anomaly detection and LLM augmentation.

Abstract

Conflicting objectives present a considerable challenge in interleaving multi-task learning, necessitating the need for meticulous design and balance to ensure effective learning of a representative latent data space across all tasks without mutual negative impact. Drawing inspiration from the concept of marginal and conditional probability distributions in probability theory, we design a principled and well-founded approach to disentangle the original input into marginal and conditional probability distributions in the latent space of a variational autoencoder. Our proposed model, Deep Disentangled Interleaving Variational Encoding (DeepDIVE) learns disentangled features from the original input to form clusters in the embedding space and unifies these features via the cross-attention mechanism in the fusion stage. We theoretically prove that combining the objectives for reconstruction and forecasting fully captures the lower bound and mathematically derive a loss function for disentanglement using Naïve Bayes. Under the assumption that the prior is a mixture of log-concave distributions, we also establish that the Kullback-Leibler divergence between the prior and the posterior is upper bounded by a function minimized by the minimizer of the cross entropy loss, informing our adoption of radial basis functions (RBF) and cross entropy with interleaving training for DeepDIVE to provide a justified basis for convergence. Experiments on two public datasets show that DeepDIVE disentangles the original input and yields forecast accuracies better than the original VAE and comparable to existing state-of-the-art baselines.
Paper Structure (25 sections, 9 theorems, 44 equations, 5 figures, 3 tables)

This paper contains 25 sections, 9 theorems, 44 equations, 5 figures, 3 tables.

Key Result

Proposition 1

Given jointly continuous random variables $x$ and $y$, joint probability density function $p(x,y)=p(y|x)p(x)$, the log likelihood of the joint distribution can be written as where the Evidence Lower Bound can be written as Similar to equation elbo_vae, $\mathcal{L}(\theta, \phi; x,y)$ in equation elbo_deepdive is also a lower bound on the log-likelihood in equation loglikelihood_deepdive.

Figures (5)

  • Figure 1: Model architecture for DeepDIVE.
  • Figure 2: Correlation between Gait Type and Stride Length in gait.
  • Figure 3: Disentangled representation space for electricity.
  • Figure 4: Density of the latent embeddings along each marginal dimension of representation space for electricity. Compared to Fig. \ref{['ls_scatter']}, in which there may be overlaps, Fig. \ref{['ls_1ddensity']} more clearly shows the distribution and concentration of data points along the marginal dimensions, for easier identification of class distributions along each dimension.
  • Figure 5: Graphical overview of loss function derivation for DeepDIVE, with corresponding assumptions made.

Theorems & Definitions (10)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • Proposition 5
  • Corollary
  • Definition 1
  • Proposition 6
  • Corollary
  • Theorem 1