Table of Contents
Fetching ...

Uncovering Physical Drivers of Dark Matter Halo Structures with Auxiliary-Variable-Guided Generative Models

Arkaprabha Ganguli, Anirban Samaddar, Florian Kéruzoré, Nesar Ramachandra, Julie Bessac, Sandeep Madireddy, Emil Constantinescu

TL;DR

This work demonstrates that auxiliary guidance preserves generative flexibility while yielding physically meaningful, disentangled embeddings, providing a generalizable pathway for uncovering independent factors in complex astronomical datasets.

Abstract

Deep generative models (DGMs) compress high-dimensional data but often entangle distinct physical factors in their latent spaces. We present an auxiliary-variable-guided framework for disentangling representations of thermal Sunyaev-Zel'dovich (tSZ) maps of dark matter halos. We introduce halo mass and concentration as auxiliary variables and apply a lightweight alignment penalty to encourage latent dimensions to reflect these physical quantities. To generate sharp and realistic samples, we extend latent conditional flow matching (LCFM), a state-of-the-art generative model, to enforce disentanglement in the latent space. Our Disentangled Latent-CFM (DL-CFM) model recovers the established mass-concentration scaling relation and identifies latent space outliers that may correspond to unusual halo formation histories. By linking latent coordinates to interpretable astrophysical properties, our method transforms the latent space into a diagnostic tool for cosmological structure. This work demonstrates that auxiliary guidance preserves generative flexibility while yielding physically meaningful, disentangled embeddings, providing a generalizable pathway for uncovering independent factors in complex astronomical datasets.

Uncovering Physical Drivers of Dark Matter Halo Structures with Auxiliary-Variable-Guided Generative Models

TL;DR

This work demonstrates that auxiliary guidance preserves generative flexibility while yielding physically meaningful, disentangled embeddings, providing a generalizable pathway for uncovering independent factors in complex astronomical datasets.

Abstract

Deep generative models (DGMs) compress high-dimensional data but often entangle distinct physical factors in their latent spaces. We present an auxiliary-variable-guided framework for disentangling representations of thermal Sunyaev-Zel'dovich (tSZ) maps of dark matter halos. We introduce halo mass and concentration as auxiliary variables and apply a lightweight alignment penalty to encourage latent dimensions to reflect these physical quantities. To generate sharp and realistic samples, we extend latent conditional flow matching (LCFM), a state-of-the-art generative model, to enforce disentanglement in the latent space. Our Disentangled Latent-CFM (DL-CFM) model recovers the established mass-concentration scaling relation and identifies latent space outliers that may correspond to unusual halo formation histories. By linking latent coordinates to interpretable astrophysical properties, our method transforms the latent space into a diagnostic tool for cosmological structure. This work demonstrates that auxiliary guidance preserves generative flexibility while yielding physically meaningful, disentangled embeddings, providing a generalizable pathway for uncovering independent factors in complex astronomical datasets.
Paper Structure (25 sections, 12 equations, 9 figures, 2 tables, 2 algorithms)

This paper contains 25 sections, 12 equations, 9 figures, 2 tables, 2 algorithms.

Figures (9)

  • Figure 1: Alignment of guided latents with mass and concentration.
  • Figure 2: Traversals along guided latents ($z_{\mathrm{aux}}$) with $z_{\mathrm{rec}}$ fixed. Rows: mass, concentration.
  • Figure 3: Generating samples from the center (top) and tail (bottom) of the reconstruction-focused latents $z_{\mathrm{rec}}$, with the first two auxiliary-guided coordinates fixed at $(z_1,z_2)=(0.001,0.001)$.
  • Figure A.1: Schematic of the DL-CFM inference. Given a training sample, we fix the sampled latent from the disentangled latent space. The latent variable is used in the vector field network to evolve the source samples to the data distribution. For demonstration, we show two snapshots of the iterative reverse process at $t=0.5$ (left) and $t=1$ (right) using the vector field U-Net.
  • Figure A.2: Generating samples from the center of the reconstruction-focused latents $z_{\mathrm{rec}}$, with the first two auxiliary-guided coordinates fixed at $(z_1,z_2)=(0.001,0.9)$ - low-mass high-concentration setting.
  • ...and 4 more figures