Table of Contents
Fetching ...

Piecewise Normalizing Flows

Harry Bevins, Will Handley, Thomas Gessey-Jones

TL;DR

The paper tackles the difficulty of modeling multi-modal distributions with normalizing flows by introducing piecewise normalizing flows (PNFs), which cluster the target distribution and train separate Gaussian-base NFs on each cluster. This segmentation aligns each piece's topology with the base distribution, enabling parallel training and often improving emulation accuracy versus single-flow and resampled-base baselines. Benchmark results on toy multimodal distributions show PNFs achieve lower KL divergences than competing methods, though gains on some real-world datasets are data-dependent. The approach offers a practical, scalable path to better capture complex densities while highlighting considerations around clustering choices and limitations of the method.

Abstract

Normalizing flows are an established approach for modelling complex probability densities through invertible transformations from a base distribution. However, the accuracy with which the target distribution can be captured by the normalizing flow is strongly influenced by the topology of the base distribution. A mismatch between the topology of the target and the base can result in a poor performance, as is typically the case for multi-modal problems. A number of different works have attempted to modify the topology of the base distribution to better match the target, either through the use of Gaussian Mixture Models (Izmailov et al., 2020; Ardizzone et al., 2020; Hagemann & Neumayer, 2021) or learned accept/reject sampling (Stimper et al., 2022). We introduce piecewise normalizing flows which divide the target distribution into clusters, with topologies that better match the standard normal base distribution, and train a series of flows to model complex multi-modal targets. We demonstrate the performance of the piecewise flows using some standard benchmarks and compare the accuracy of the flows to the approach taken in Stimper et al. (2022) for modelling multi-modal distributions. We find that our approach consistently outperforms the approach in Stimper et al. (2022) with a higher emulation accuracy on the standard benchmarks.

Piecewise Normalizing Flows

TL;DR

The paper tackles the difficulty of modeling multi-modal distributions with normalizing flows by introducing piecewise normalizing flows (PNFs), which cluster the target distribution and train separate Gaussian-base NFs on each cluster. This segmentation aligns each piece's topology with the base distribution, enabling parallel training and often improving emulation accuracy versus single-flow and resampled-base baselines. Benchmark results on toy multimodal distributions show PNFs achieve lower KL divergences than competing methods, though gains on some real-world datasets are data-dependent. The approach offers a practical, scalable path to better capture complex densities while highlighting considerations around clustering choices and limitations of the method.

Abstract

Normalizing flows are an established approach for modelling complex probability densities through invertible transformations from a base distribution. However, the accuracy with which the target distribution can be captured by the normalizing flow is strongly influenced by the topology of the base distribution. A mismatch between the topology of the target and the base can result in a poor performance, as is typically the case for multi-modal problems. A number of different works have attempted to modify the topology of the base distribution to better match the target, either through the use of Gaussian Mixture Models (Izmailov et al., 2020; Ardizzone et al., 2020; Hagemann & Neumayer, 2021) or learned accept/reject sampling (Stimper et al., 2022). We introduce piecewise normalizing flows which divide the target distribution into clusters, with topologies that better match the standard normal base distribution, and train a series of flows to model complex multi-modal targets. We demonstrate the performance of the piecewise flows using some standard benchmarks and compare the accuracy of the flows to the approach taken in Stimper et al. (2022) for modelling multi-modal distributions. We find that our approach consistently outperforms the approach in Stimper et al. (2022) with a higher emulation accuracy on the standard benchmarks.
Paper Structure (11 sections, 19 equations, 5 figures, 4 tables)

This paper contains 11 sections, 19 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: A simple demonstration of the piecewise normalizing flow (NF) described in this paper and a single NF trained on the same multimodal target distribution. We use early stopping to train the different normalizing flows. For the single NF we see bridging between the different clusters that is not present in the piecewise approach. The architecture of the MAF and piecewise MAF components are chosen such that the two have approximately the same number of hyperparameters.
  • Figure 2: The figure shows how the silhouette score, $s$, can be used to determine the number of clusters needed to describe the data. In the example $s$ peaks for eight clusters corresponding to one for each Gaussian in the circle.
  • Figure 3: The figure shows the training and sampling process for our piecewise NFs. From left to right, we start with a set of samples $x$ drawn from $p_X(x)$ and classify the samples into clusters (e.g. using $k$-means), determining the number of clusters to use based on the silhouette score. We then train a Masked Autoregressive Flow on each cluster $k$ using a standard normal distribution as our base. When drawing samples from $p_\theta(x)$ we select which MAF to draw from using weights defined according to the total cluster weight.
  • Figure 4: The graph shows a series of multi-modal target distributions as 2D histograms of samples in the first column and three representations of this distribution produced with three different types of normalizing flow. We use early stopping, as described in the main text, to train all the normalizing flows. The second column shows a simple Masked Autoregressive Flow with a Gaussian base distribution based on the work in MAFs and the later two columns show two approaches to improve the accuracy of the model. The first is from Stimper2021 and uses a RealNVP flow to model the distributions while resampling the base distribution to better match the topology of the target distribution. In the last column, we show the results from our PNFs, where we have effectively modified the topology of the target distribution to better match the Gaussian base distribution by performing clustering.
  • Figure 5: The figure shows samples drawn from various different PNFs built using different clustering algorithms, along with samples from the true distribution and samples from a single masked autoregressive flow. The figure illustrates that the PNF can be effectively built from using different clustering algorithms and, along with \ref{['tab:clustering_algorithm']}, that some clustering algorithms perform better than others. We choose network architectures for our PNFs and MAF such that they all have approximately the same number of hyperparameters.