Distributional Principal Autoencoders

Xinwei Shen; Nicolai Meinshausen

Distributional Principal Autoencoders

Xinwei Shen, Nicolai Meinshausen

TL;DR

The paper addresses the loss of distributional information in traditional dimensionality reduction by introducing Distributional Principal Autoencoder (DPA), which enforces distributional reconstruction through a stochastic decoder conditioned on a low-dimensional latent variable. By optimizing an energy-score based joint objective for the encoder and decoder, DPA yields distributionally identical reconstructions $d(e(X),\varepsilon)\overset{d}=X$ for any latent dimension $k$ and supports adaptive dimensionality via an ordered latent representation. Empirical results across climate fields, single-cell RNA data, and image benchmarks show that DPA preserves the original data distribution, achieves meaningful latent embeddings, and outperforms PCA/AE on distributional metrics while maintaining competitive mean reconstruction. The approach enables distributional transport in latent space and has practical implications for bias correction, shift detection, and high-dimensional predictive tasks, with proofs and detailed methodology provided in the appendices.

Abstract

Dimension reduction techniques usually lose information in the sense that reconstructed data are not identical to the original data. However, we argue that it is possible to have reconstructed data identically distributed as the original data, irrespective of the retained dimension or the specific mapping. This can be achieved by learning a distributional model that matches the conditional distribution of data given its low-dimensional latent variables. Motivated by this, we propose Distributional Principal Autoencoder (DPA) that consists of an encoder that maps high-dimensional data to low-dimensional latent variables and a decoder that maps the latent variables back to the data space. For reducing the dimension, the DPA encoder aims to minimise the unexplained variability of the data with an adaptive choice of the latent dimension. For reconstructing data, the DPA decoder aims to match the conditional distribution of all data that are mapped to a certain latent value, thus ensuring that the reconstructed data retains the original data distribution. Our numerical results on climate data, single-cell data, and image benchmarks demonstrate the practical feasibility and success of the approach in reconstructing the original distribution of the data. DPA embeddings are shown to preserve meaningful structures of data such as the seasonal cycle for precipitations and cell types for gene expression.

Distributional Principal Autoencoders

TL;DR

for any latent dimension

and supports adaptive dimensionality via an ordered latent representation. Empirical results across climate fields, single-cell RNA data, and image benchmarks show that DPA preserves the original data distribution, achieves meaningful latent embeddings, and outperforms PCA/AE on distributional metrics while maintaining competitive mean reconstruction. The approach enables distributional transport in latent space and has practical implications for bias correction, shift detection, and high-dimensional predictive tasks, with proofs and detailed methodology provided in the appendices.

Abstract

Paper Structure (16 sections, 8 theorems, 37 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 16 sections, 8 theorems, 37 equations, 9 figures, 2 tables, 1 algorithm.

Introduction
Related work
Software
Distributional Principal Autoencoders
DPA encoder
DPA decoder
Joint formulation for encoder and decoder
DPA for an adaptive latent dimension
Empirical results
Reconstructions
Embeddings
Discussion
Proofs
Experimental details
Data sets and preprocessing
...and 1 more sections

Key Result

Proposition 1

For any $\beta\in(0,2]$, we have

Figures (9)

Figure 1: Global monthly precipitation fields (square-root transformed, original unit $\hbox{kg} \cdot \hbox{m}^{-2} \hbox{s}^{-1}$). Top row: a test data; second row: PCA reconstructions; third row: AE reconstructions; fourth row: mean reconstructions from DPA; remaining three rows: reconstructed samples from DPA. Columns: different latent dimensions $k$.
Figure 2: Q--Q plots of precipitations at a random location for test data versus fitted distributions.
Figure 3: mnist
Figure 4: disk
Figure 5: r-temp
...and 4 more figures

Theorems & Definitions (17)

Definition 1: Oracle reconstructed distribution
Proposition 1
Example 1: Gaussian data and linear encoders
Proposition 2
Proposition 3
Proposition 4
Theorem 1
Proposition 5
Proposition 6
proof : Proof of Proposition \ref{['prop:two_terms_es_equal']}
...and 7 more

Distributional Principal Autoencoders

TL;DR

Abstract

Distributional Principal Autoencoders

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (17)