Table of Contents
Fetching ...

Rare event modeling with self-regularized normalizing flows: what can we learn from a single failure?

Charles Dawson, Van Tran, Max Z. Li, Chuchu Fan

TL;DR

Rare-event modeling under data scarcity is addressed by CalNF, a calibrated normalizing flow framework that amortizes posterior learning across nominal and target datasets. By learning a low-dimensional embedding of a family of posteriors $q_\\phi(z; c)$ and calibrating with a continuous label $c^*$, CalNF achieves state-of-the-art performance on data-constrained benchmarks and enables post-mmortem analysis of the 2022 Southwest scheduling crisis. Theoretical results bound the proliferation of error via an $L$-Lipschitz flow in the calibration input, providing implicit regularization in the Wasserstein space. Empirically, CalNF outperforms prior regularization and ensemble baselines on air-traffic, UAV control, seismic imaging, and image few-shot tasks, and offers utility for anomaly detection and generative modeling to stress-test safety-critical systems. This approach opens avenues for data-efficient failure analysis, risk-informed design, and robust network management in cyber-physical contexts.

Abstract

Increased deployment of autonomous systems in fields like transportation and robotics have seen a corresponding increase in safety-critical failures. These failures can be difficult to model and debug due to the relative lack of data: compared to tens of thousands of examples from normal operations, we may have only seconds of data leading up to the failure. This scarcity makes it challenging to train generative models of rare failure events, as existing methods risk either overfitting to noise in the limited failure dataset or underfitting due to an overly strong prior. We address this challenge with CalNF, or calibrated normalizing flows, a self-regularized framework for posterior learning from limited data. CalNF achieves state-of-the-art performance on data-limited failure modeling and inverse problems and enables a first-of-a-kind case study into the root causes of the 2022 Southwest Airlines scheduling crisis.

Rare event modeling with self-regularized normalizing flows: what can we learn from a single failure?

TL;DR

Rare-event modeling under data scarcity is addressed by CalNF, a calibrated normalizing flow framework that amortizes posterior learning across nominal and target datasets. By learning a low-dimensional embedding of a family of posteriors and calibrating with a continuous label , CalNF achieves state-of-the-art performance on data-constrained benchmarks and enables post-mmortem analysis of the 2022 Southwest scheduling crisis. Theoretical results bound the proliferation of error via an -Lipschitz flow in the calibration input, providing implicit regularization in the Wasserstein space. Empirically, CalNF outperforms prior regularization and ensemble baselines on air-traffic, UAV control, seismic imaging, and image few-shot tasks, and offers utility for anomaly detection and generative modeling to stress-test safety-critical systems. This approach opens avenues for data-efficient failure analysis, risk-informed design, and robust network management in cyber-physical contexts.

Abstract

Increased deployment of autonomous systems in fields like transportation and robotics have seen a corresponding increase in safety-critical failures. These failures can be difficult to model and debug due to the relative lack of data: compared to tens of thousands of examples from normal operations, we may have only seconds of data leading up to the failure. This scarcity makes it challenging to train generative models of rare failure events, as existing methods risk either overfitting to noise in the limited failure dataset or underfitting due to an overly strong prior. We address this challenge with CalNF, or calibrated normalizing flows, a self-regularized framework for posterior learning from limited data. CalNF achieves state-of-the-art performance on data-limited failure modeling and inverse problems and enables a first-of-a-kind case study into the root causes of the 2022 Southwest Airlines scheduling crisis.

Paper Structure

This paper contains 35 sections, 3 theorems, 17 equations, 14 figures, 10 tables, 1 algorithm.

Key Result

Lemma 1

Let $\mathcal{D} = \{z^{(i)}\}_{i=1}^N$ be sparse dataset with distance $O\left( (LN)^{-1/(d+1)} \right)$ between points (the precise limit is given in the appendix), and let $q_\phi(z)$ be a model capable of representing any $L$-Lipschitz probability density. If $\phi(\mathcal{D})$ are the paramete

Figures (14)

  • Figure 1: Inference in data-constrained environments. (a) The ground truth distribution. (b) An imbalanced dataset. (c) When the regularization strength $\beta$ is too small, deep models overfit to noise in the target dataset. (d) When $\beta$ is too large, the learned distribution underfits and struggles to distinguish between nominal and target distributions. (e) Our method learns a more accurate reconstruction of the target distribution using hyperparameter-insensitive self-regularization.
  • Figure 2: (Left) CalNF architecture: A normalizing flow is trained on random subsamples of the target data and the full nominal dataset, using one-hot labels to identify different subsamples ($\bullet$) and the zero vector to identify the nominal data ($\circ$). The model is calibrated by freezing the model parameters and optimizing the label on the entire target dataset ($\bigstar$). (Right) Target candidates: The nominal posterior $q_\phi(z; \mathbf{0})$ (blue) and the family of candidate distributions for the target posterior $q_\phi(z; \lambda \mathbf{1}_i)$, shown for varying values of the calibration label.
  • Figure 3: Seismic waveform inversion. (a) The ground truth nominal and target density profiles. (b-d) The posteriors fit using KL and $W_2$ regularization and CalNF (ours is the only method able to correctly infer the target density profile).
  • Figure 3: Log-likelihood (bits/dim) on held-out images, reporting mean and standard deviation across four seeds. Higher is better.
  • Figure 4: Single-shot UAV failure dynamics predicted by CalNF.
  • ...and 9 more figures

Theorems & Definitions (11)

  • Lemma 1
  • Lemma 2
  • Theorem 1
  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4: from verineExpressivityBiLipschitzNormalizing2023
  • Remark 5: from verineExpressivityBiLipschitzNormalizing2023
  • proof
  • proof
  • ...and 1 more