Nonparametric Automatic Differentiation Variational Inference with Spline Approximation
Yuda Shao, Shan Yu, Tianshu Feng
TL;DR
This paper introduces S-ADVI, a spline-based nonparametric variational inference framework that replaces parametric posteriors with spline mixtures to capture complex posterior shapes, including skewness, multimodality, and bounded support. It derives a spline representation for the posterior, provides a theoretical analysis establishing a lower bound for $IWAE$ and bounds on the KL divergence between the spline approximation and the true posterior, and discusses adaptive boundary handling and regularization via a roughness penalty. The authors implement a practical training procedure using a concrete distribution and annealing to sample from spline mixtures and apply the reparameterization trick for backpropagation. Empirical results on simulated cases and real data (e.g., FMNIST, MNIST, CIFAR-10) show S-ADVI can outperform Gaussian-ADVI and GM-ADVI in posterior recovery and, in many settings, achieve competitive or superior reconstruction and classification performance, with interpretable spline coefficients offering insight into latent-variable shapes. Overall, S-ADVI advances variational inference by combining flexibility, interpretability, and theoretical guarantees, enabling effective Bayesian modeling of complex posteriors and incomplete-data generative tasks.
Abstract
Automatic Differentiation Variational Inference (ADVI) is efficient in learning probabilistic models. Classic ADVI relies on the parametric approach to approximate the posterior. In this paper, we develop a spline-based nonparametric approximation approach that enables flexible posterior approximation for distributions with complicated structures, such as skewness, multimodality, and bounded support. Compared with widely-used nonparametric variational inference methods, the proposed method is easy to implement and adaptive to various data structures. By adopting the spline approximation, we derive a lower bound of the importance weighted autoencoder and establish the asymptotic consistency. Experiments demonstrate the efficiency of the proposed method in approximating complex posterior distributions and improving the performance of generative models with incomplete data.
