Stabilizing the Kumaraswamy Distribution

Max Wasserman; Gonzalo Mateos

Stabilizing the Kumaraswamy Distribution

Max Wasserman, Gonzalo Mateos

TL;DR

This work identifies and resolve numerical instabilities in the inverse CDF and log-pdf, and introduces simple and scalable latent variable models based on the Kumaraswamy distribution, improving exploration-exploitation trade-offs in contextual multi-armed bandits and enhancing uncertainty quantification for link prediction with graph neural networks.

Abstract

Large-scale latent variable models require expressive continuous distributions that support efficient sampling and low-variance differentiation, achievable through the reparameterization trick. The Kumaraswamy (KS) distribution is both expressive and supports the reparameterization trick with a simple closed-form inverse CDF. Yet, its adoption remains limited. We identify and resolve numerical instabilities in the inverse CDF and log-pdf, exposing issues in libraries like PyTorch and TensorFlow. We then introduce simple and scalable latent variable models based on the KS, improving exploration-exploitation trade-offs in contextual multi-armed bandits and enhancing uncertainty quantification for link prediction with graph neural networks. Our results support the stabilized KS distribution as a core component in scalable variational models for bounded latent variables.

Stabilizing the Kumaraswamy Distribution

TL;DR

Abstract

Paper Structure (18 sections, 12 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 18 sections, 12 equations, 6 figures, 3 tables, 1 algorithm.

Introduction
Background
Stabilizing the Kumaraswamy
Experiments
Image Variational Auto-Encoders
Contextual Bernoulli multi-armed bandits
Variational link prediction with Graph Neural Networks
Related Work
Conclusion, Limitations, and Future Work
Appendix
Bounded Interval-Supported Distributions
Precision Enhancing Functions
Counter Intuitive Stability Properties of the Unstable KS
VBE Modified ELBO Derivation
VAE architectural and training choices
...and 3 more sections

Figures (6)

Figure 1: Comparison of relevant bounded interval-supported distributions. Left: Time for sampling and differentiating through samples. The Beta lacks explicit reparameterization, and has slower sampling and gradients. Right: Expressiveness in terms of attainable prototypical shapes.
Figure 2: log1mexp(x) maintains accuracy throughout single precision.
Figure 3: Stabilizing $\log(1 - \exp(x))$ terms eliminates numerical instabilities in the KS log-pdf and inverse CDF. We compare the unstable PyTorch KS implementation (top row) and our stable KS (bottom row) for realistic KS distributions ($\log_2 b = 24$, varying $a$). Catastrophic cancellation in the $\log(1 - \exp(x))$ terms in the PyTorch KS causes jagged curves and inverse CDF underflow beyond $u \approx 1-39.3$, resulting in a point mass of $\approx 39.3$ at $x=0$ in the sampling distribution. Our stable KS removes the instability by using log1mexp.
Figure 4: Variational Bandit Encoder
Figure 5: VEE-KS produces informative and calibrated edge posterior predictives.
...and 1 more figures

Stabilizing the Kumaraswamy Distribution

TL;DR

Abstract

Stabilizing the Kumaraswamy Distribution

Authors

TL;DR

Abstract

Table of Contents

Figures (6)