Disentangled Interleaving Variational Encoding
Noelle Y. L. Wong, Eng Yeow Cheu, Zhonglin Chiam, Dipti Srinivasan
TL;DR
DeepDIVE addresses gradient conflict in multi-objective learning by deriving a unified loss from the data log-likelihood, enabling non-conflicting optimization for reconstruction, forecasting, and disentanglement. It extends variational autoencoders to semi-supervised forecasting by introducing a disentangled latent space with marginal ($n_2$) and conditional ($n_1$) components and employing interleaving training with cross-attention fusion to integrate these factors. The approach leverages a naive Bayes independence assumption to derive a disentanglement loss, replaces the marginal KL term with a cross-entropy bound under a mixture-RBF prior, and proves that the overall loss decomposes without mutual gradient conflict. Empirical results on gait and electricity time-series show that DeepDIVE disentangles inputs effectively and achieves forecast accuracy better than the original VAE and comparable to state-of-the-art baselines, while providing interpretable latent representations. Overall, the work offers a principled framework for multi-task latent learning with practical impact on forecasting under uncertainty and potential extensions to anomaly detection and LLM augmentation.
Abstract
Conflicting objectives present a considerable challenge in interleaving multi-task learning, necessitating the need for meticulous design and balance to ensure effective learning of a representative latent data space across all tasks without mutual negative impact. Drawing inspiration from the concept of marginal and conditional probability distributions in probability theory, we design a principled and well-founded approach to disentangle the original input into marginal and conditional probability distributions in the latent space of a variational autoencoder. Our proposed model, Deep Disentangled Interleaving Variational Encoding (DeepDIVE) learns disentangled features from the original input to form clusters in the embedding space and unifies these features via the cross-attention mechanism in the fusion stage. We theoretically prove that combining the objectives for reconstruction and forecasting fully captures the lower bound and mathematically derive a loss function for disentanglement using Naïve Bayes. Under the assumption that the prior is a mixture of log-concave distributions, we also establish that the Kullback-Leibler divergence between the prior and the posterior is upper bounded by a function minimized by the minimizer of the cross entropy loss, informing our adoption of radial basis functions (RBF) and cross entropy with interleaving training for DeepDIVE to provide a justified basis for convergence. Experiments on two public datasets show that DeepDIVE disentangles the original input and yields forecast accuracies better than the original VAE and comparable to existing state-of-the-art baselines.
