Table of Contents
Fetching ...

SlotFlow: Amortized Trans-Dimensional Inference with Slot-Based Normalizing Flows

Niklas Houba, Giovanni Giarda, Lorenzo Speri

TL;DR

SlotFlow tackles trans-dimensional Bayesian inference where the number of components $K$ is unknown. It uses a dual-stream encoder to fuse time and frequency information, a dynamic slot allocator that instantiates exactly $\hat{K}$ slots, and shared conditional normalizing flows with Hungarian matching to produce per-slot posteriors, achieving $O(K)$ computation and millisecond latency. The approach yields 99.85% cardinality accuracy and well-calibrated parameter posteriors on crowded sinusoidal mixtures, while matching amplitude and phase closely to RJMCMC and exposing two–threefold broader frequency posteriors due to encoder bottlenecks. Compared with RJMCMC, SlotFlow offers ~$10^6\times$ speedups, enabling real-time or interactive analysis in gravitational-wave astronomy, neural spike sorting, and object-centric vision. Limitations include the factorized posterior approximation and encoder-induced frequency precision loss; future work targets multi-scale encoders, time–frequency representations, and explicit inter-slot dependencies to further improve frequency resolution without sacrificing efficiency.

Abstract

Inferring the number of distinct components contributing to an observation, while simultaneously estimating their parameters, remains a long-standing challenge across signal processing, astrophysics, and neuroscience. Classical trans-dimensional Bayesian methods such as Reversible Jump Markov Chain Monte Carlo (RJMCMC) provide asymptotically exact inference but can be computationally expensive. Instead, modern deep learning provides a faster alternative to inference but typically assume fixed component counts, sidestepping the core challenge of trans-dimensionality. To address this, we introduce SlotFlow, a deep learning architecture for trans-dimensional amortized inference. The architecture processes time-series observations, which we represent jointly in the frequency and time domains through parallel encoders. A classifier produces a distribution over component counts K, and its MAP estimate specifies the number of slots instantiated. Each slot is parameterized by a shared conditional normalizing flow trained via permutation-invariant Hungarian matching. On sinusoidal decomposition with up to 10 overlapping components and Gaussian noise, SlotFlow achieves 99.85% cardinality accuracy and well-calibrated parameter posteriors, with systematic biases well below one posterior standard deviation. Direct comparison with RJMCMC shows close agreement in amplitude and phase, with Wasserstein distances $W_2 < 0.01$ and $< 0.03$, indicating that shared global context captures inter-component structure despite a factorized posterior. Frequency posteriors remain centered but exhibit 2-3x broader intervals, consistent with an encoder bottleneck in retaining long-baseline phase coherence. The method delivers a $\sim 10^6\times$ speedup over RJMCMC, suggesting applicability to time-critical workflows in gravitational-wave astronomy, neural spike sorting, and object-centric vision.

SlotFlow: Amortized Trans-Dimensional Inference with Slot-Based Normalizing Flows

TL;DR

SlotFlow tackles trans-dimensional Bayesian inference where the number of components is unknown. It uses a dual-stream encoder to fuse time and frequency information, a dynamic slot allocator that instantiates exactly slots, and shared conditional normalizing flows with Hungarian matching to produce per-slot posteriors, achieving computation and millisecond latency. The approach yields 99.85% cardinality accuracy and well-calibrated parameter posteriors on crowded sinusoidal mixtures, while matching amplitude and phase closely to RJMCMC and exposing two–threefold broader frequency posteriors due to encoder bottlenecks. Compared with RJMCMC, SlotFlow offers ~ speedups, enabling real-time or interactive analysis in gravitational-wave astronomy, neural spike sorting, and object-centric vision. Limitations include the factorized posterior approximation and encoder-induced frequency precision loss; future work targets multi-scale encoders, time–frequency representations, and explicit inter-slot dependencies to further improve frequency resolution without sacrificing efficiency.

Abstract

Inferring the number of distinct components contributing to an observation, while simultaneously estimating their parameters, remains a long-standing challenge across signal processing, astrophysics, and neuroscience. Classical trans-dimensional Bayesian methods such as Reversible Jump Markov Chain Monte Carlo (RJMCMC) provide asymptotically exact inference but can be computationally expensive. Instead, modern deep learning provides a faster alternative to inference but typically assume fixed component counts, sidestepping the core challenge of trans-dimensionality. To address this, we introduce SlotFlow, a deep learning architecture for trans-dimensional amortized inference. The architecture processes time-series observations, which we represent jointly in the frequency and time domains through parallel encoders. A classifier produces a distribution over component counts K, and its MAP estimate specifies the number of slots instantiated. Each slot is parameterized by a shared conditional normalizing flow trained via permutation-invariant Hungarian matching. On sinusoidal decomposition with up to 10 overlapping components and Gaussian noise, SlotFlow achieves 99.85% cardinality accuracy and well-calibrated parameter posteriors, with systematic biases well below one posterior standard deviation. Direct comparison with RJMCMC shows close agreement in amplitude and phase, with Wasserstein distances and , indicating that shared global context captures inter-component structure despite a factorized posterior. Frequency posteriors remain centered but exhibit 2-3x broader intervals, consistent with an encoder bottleneck in retaining long-baseline phase coherence. The method delivers a speedup over RJMCMC, suggesting applicability to time-critical workflows in gravitational-wave astronomy, neural spike sorting, and object-centric vision.

Paper Structure

This paper contains 108 sections, 10 theorems, 77 equations, 20 figures, 4 tables.

Key Result

Theorem 4.1

For any $K \in \{1,\dots,K_{\max}\}$ and permutation $\sigma \in S_K$, the symmetrized posterior satisfies $q_\phi(\{\theta_{\sigma(k)}\}_{k=1}^K \mid x, K) = q_\phi(\{\theta_k\}_{k=1}^K \mid x, K)$.

Figures (20)

  • Figure 1: SlotFlow Architecture. The model processes input signals through four stages: (I) Dual-stream encoding (FFT/time) via convolutional encoders and multi-head attention; (II) Cardinality estimation via pooled global features; (III) Slot context generation fusing global and slot-specific embeddings; and (IV) Conditional flow inference producing per-component posteriors. The red control arrow denotes how the predicted $\hat{K}$ dynamically sets the number of slot contexts.
  • Figure 2: Intra-Sample Parameter Separations Characterizing Dataset Complexity. (a) Amplitude differences $\Delta A$ between ordered components follow a log-normal distribution with geometric mean 0.11 (48% of pairs differ by under 20%). (b) Frequency separations $\Delta f$ exhibit a hard threshold at $\Delta f_{\min} = 0.01$ Hz, with 31% of pairs within $1.5\Delta f_{\min}$ and 73% within $3\Delta f_{\min}$. (c) Circular phase separations $\Delta\phi \in [0,\pi]$ follow the expected triangular distribution for uniformly random phases ($p=0.82$, Kolmogorov--Smirnov test), confirming no spurious correlations.
  • Figure 3: Signal-to-Noise Ratio Distribution. SNR is computed as $\mathrm{SNR}=10\log_{10}(\sum_k a_k^2/2\sigma^2)$, where $\sum_k a_k^2/2$ is the time-averaged signal power and $\sigma^2$ is the noise variance. The histogram shows the empirical distribution over 1000 random samples with a KDE overlaid. The primary peak near $8$ dB corresponds to typical $K\!\approx\!5$ mixtures with moderate noise, while the tail to $\sim 40$--$60$ dB arises for $K$ large and $\sigma$ small; SNRs down to $-6$ dB cover noise-dominated cases. This range spans confusion-limited and high-SNR regimes, providing a comprehensive stress test for the inference model.
  • Figure 4: Emergent Frequency-Based Slot Specialization. Across 1000 independent 10-component mixtures, each slot spontaneously develops consistent frequency preferences (boxplots show matched frequency distribution per slot) despite receiving no explicit supervision to do so. The smooth gradient from low to high frequencies demonstrates soft specialization: slots preferentially attend to distinct spectral regions while maintaining overlap for robustness. Long whiskers indicate slots flexibly adapt to nearby frequencies when components cluster, enabling resolution of overlapping signals. Slot indices reordered by mean matched frequency for visualization; physical slot order is arbitrary due to permutation invariance. This frequency-based organization emerges purely from training dynamics -- the architecture provides no preference for frequency over amplitude or phase, yet frequency naturally becomes the organizing principle because it offers the strongest separability in the dual-stream encoder's spectral representation.
  • Figure 5: Confusion Matrix for Model-Order Inference. The model attains 99.85% accuracy over 10,000 test samples. All $K\!\in\!\{1,\ldots,8\}$ are recovered perfectly. Only 15 errors occur in total -- 14 cases of $K{=}10$ predicted as $K{=}9$ and one case of $K{=}9$ predicted as $K{=}8$ -- arising in high-density configurations near the minimum separation. The strong diagonal structure indicates no systematic over- or under-estimation bias.
  • ...and 15 more figures

Theorems & Definitions (12)

  • Theorem 4.1: Permutation Invariance
  • Proposition 4.2: ELBO Decomposition
  • Lemma 4.3: Gradient Amplification via Target Weighting
  • Proposition 4.4: Optimal Phase Weight
  • Lemma 4.5: Flow Approximation Error
  • Proposition 4.6: Depth-Accuracy Trade-off
  • Theorem 4.7: Sample Complexity with Explicit Scaling
  • Definition 4.8: Component Distinguishability
  • Lemma 1.1: Group Closure of $S_K$
  • Lemma 1.2: Marginal Posterior Structure
  • ...and 2 more