Table of Contents
Fetching ...

VINA: Variational Invertible Neural Architectures

Shubhanshu Shekhar, Mohammad Javad Khojasteh, Ananya Acharya, Tony Tohme, Kamal Youcef-Toumi

TL;DR

A unified framework for INNs and NFs is introduced based on variational unsupervised loss functions, inspired by analogous formulations in related areas such as generative adversarial networks (GANs) and the Precision-Recall divergence for training normalizing flows.

Abstract

The distinctive architectural features of normalizing flows (NFs), notably bijectivity and tractable Jacobians, make them well-suited for generative modeling. Invertible neural networks (INNs) build on these principles to address supervised inverse problems, enabling direct modeling of both forward and inverse mappings. In this paper, we revisit these architectures from both theoretical and practical perspectives and address a key gap in the literature: the lack of theoretical guarantees on approximation quality under realistic assumptions, whether for posterior inference in INNs or for generative modeling with NFs. We introduce a unified framework for INNs and NFs based on variational unsupervised loss functions, inspired by analogous formulations in related areas such as generative adversarial networks (GANs) and the Precision-Recall divergence for training normalizing flows. Within this framework, we derive theoretical performance guarantees, quantifying posterior accuracy for INNs and distributional accuracy for NFs, under assumptions that are weaker and more practically realistic than those used in prior work. Building on these theoretical results, we conduct extensive case studies to distill general design principles and practical guidelines. We conclude by demonstrating the effectiveness of our approach on a realistic ocean-acoustic inversion problem.

VINA: Variational Invertible Neural Architectures

TL;DR

A unified framework for INNs and NFs is introduced based on variational unsupervised loss functions, inspired by analogous formulations in related areas such as generative adversarial networks (GANs) and the Precision-Recall divergence for training normalizing flows.

Abstract

The distinctive architectural features of normalizing flows (NFs), notably bijectivity and tractable Jacobians, make them well-suited for generative modeling. Invertible neural networks (INNs) build on these principles to address supervised inverse problems, enabling direct modeling of both forward and inverse mappings. In this paper, we revisit these architectures from both theoretical and practical perspectives and address a key gap in the literature: the lack of theoretical guarantees on approximation quality under realistic assumptions, whether for posterior inference in INNs or for generative modeling with NFs. We introduce a unified framework for INNs and NFs based on variational unsupervised loss functions, inspired by analogous formulations in related areas such as generative adversarial networks (GANs) and the Precision-Recall divergence for training normalizing flows. Within this framework, we derive theoretical performance guarantees, quantifying posterior accuracy for INNs and distributional accuracy for NFs, under assumptions that are weaker and more practically realistic than those used in prior work. Building on these theoretical results, we conduct extensive case studies to distill general design principles and practical guidelines. We conclude by demonstrating the effectiveness of our approach on a realistic ocean-acoustic inversion problem.
Paper Structure (74 sections, 15 theorems, 215 equations, 11 figures, 5 tables)

This paper contains 74 sections, 15 theorems, 215 equations, 11 figures, 5 tables.

Key Result

Theorem 3.1

Suppose assump:INN-theorem holds, and $D_f$ satisfies the Pinsker-type inequality $c_f \sqrt{D_f(P \parallel Q)} \geq TV(P, Q)$ for all distributions $P, Q$, and for some constant $c_f>0$. Then, for every measurable $A \subset \mathcal{Y}$ with $P_Y(A)>0$, on the $(1-\delta)$ probability event of (I where $W_1$ denotes the $1$-Wasserstein metric, $P^{(A)}_X = P_{X|Y\in A}$, $P^{(A)}_{\hat{T}_n^{-1

Figures (11)

  • Figure 1: INN can be represented by an invertible map $T$ that approximates the relation from the input $X \in \mathbb{R}^{d_{\mathbf{x}}}$ to the output $Y \in \mathbb{R}^{d_{\mathbf{y}}}$ and a latent variable $Z \in \mathbb{R}^{d_{\mathbf{x}} - d_{\mathbf{y}}}$. The invertibility of $T$ means that for any $\mathbf{y} \in \mathbb{R}^{d_{\mathbf{y}}}$, we also have an approximate posterior sampling distribution via $T^{-1}(\mathbf{y}, Z)$.
  • Figure 2: The four plots illustrate the results of the inverse kinematics case study with different prior regularizations. (a) shows samples from a model trained without any prior loss (i.e., with prior weight $\lambda' = 0$). (b) and (c) display samples from models trained with a prior weight of $\lambda' = 1$. In (b), the model assumes a Gaussian prior over the joint configuration, resulting in a scaled $L_2$ loss between the ground truth and the generated samples. In contrast, (c) uses a uniform prior $\mathcal{U}(0,1)$, which is inconsistent with the true distribution. (d) illustrates results from a model trained with a Gaussian prior, using $\lambda' = 100$. The cross indicates the true end-effector position. All models were trained using the NLL loss formulation for the INN.
  • Figure 3: Schematic illustration of how different training paradigms are used to solve inverse problems in the Gaussian setting. (a) Bayesian neural network for solving inverse problem. There is supervised cost (SC) between predicted and true $\textbf{x}$. (b) INN Trained without $\prescript{p}{}{L_x}$. There is unsupervised cost (USC) between predicted and true $\textbf{x}$. (c) Shows an INN model trained with $\prescript{p}{}{L_x}$, which serves as a reconstruction loss.
  • Figure 4: Comparison of $f$-divergence performance for coupling-based and iResNet architectures in the inverse kinematics case study. The x-axis shows training time and the y-axis shows $\ell_2$-distance between true and generated end-effector positions. Ellipses are centered at the mean values, with horizontal and vertical diameters indicating the standard deviations in training time and $\ell_2$-distance. The right and left columns correspond to forward loss \ref{['eq:vdm-objective-forward']} and backward losses \ref{['eq:vdm-objective-backward-empirical']}, respectively. The top row corresponds to the coupling-based architecture whereas the figures on the bottom row generated based on iResNet layers.
  • Figure 5: This diagram illustrates the impact of varying latent dimensions on the posterior samples. The circular dots represent the variational formulation of the forward KL divergence and the hollow squares correspond to the Sinkhorn divergence. The x-axis shows latent dimension and the y-axis shows $\ell_2$-distance between true and generated end-effector positions.
  • ...and 6 more figures

Theorems & Definitions (29)

  • Theorem 3.1
  • Remark 3.1
  • Remark 3.2
  • Remark 3.3
  • Remark 3.4
  • Proposition 3.1
  • Proposition 3.2
  • proof
  • Proposition 3.3
  • Definition 3.1
  • ...and 19 more