Table of Contents
Fetching ...

Sweeping the Dust Away -- Correcting the Phase Space Density of the Milky Way with Unsupervised Machine Learning

Eric Putney, David Shih, Sung Hak Lim, Matthew R. Buckley

TL;DR

This work addresses how dust extinction biases the measurement of the Milky Way's gravitational potential via the Boltzmann equation by jointly inferring the dust-corrected phase-space density and the potential. It introduces a data-driven framework that learns a dust efficiency factor $\epsilon(\vec{x})$ and a gravitational potential $\Phi(\vec{x})$ using neural networks, trained on Gaia DR3 RC/RGB kinematics, and enforces the collisionless Boltzmann equation as a training objective. The method leverages Masked Autoregressive Flows to model the observed PSD, and couples this with NN parameterizations for $\epsilon$ and $\Phi$, regularized to maintain physical plausibility. The results show that the learned $\epsilon$ maps align with a state-of-the-art 3D dust map and enable a dust-corrected PSD $f_{\rm corr}$, revealing a more coherent disk structure and enabling reliable dynamical inferences in the disk volume; uncertainties are carefully quantified via ensemble trainings. Overall, this approach provides a data-driven, physically constrained path to disentangle dust effects from stellar dynamics, paving the way for robust measurements of the Galactic potential and dark matter distribution, with a companion paper outlining the corresponding acceleration and mass-density inferences.

Abstract

The Boltzmann equation relates the equilibrium phase space distribution of stars in the Milky Way to the Galaxy's gravitational potential. However, observations of stellar populations are biased by extinction from foreground dust, which complicates measurements of the potential in the disk and towards the Galactic center. Using the kinematics of Red Clump and Red Branch stars in Gaia DR3, we use machine learning to simultaneously estimate both the unbiased stellar phase space density and the gravitational potential. The unbiased phase space density is obtained through a learned "dust efficiency factor" -- an observational selection function that accounts for dust extinction. The potential and the dust efficiency are parameterized by fully connected neural networks and are completely data driven. We validate the dust efficiency using a recent three-dimensional dust map in this work, and examine the potential in a companion paper.

Sweeping the Dust Away -- Correcting the Phase Space Density of the Milky Way with Unsupervised Machine Learning

TL;DR

This work addresses how dust extinction biases the measurement of the Milky Way's gravitational potential via the Boltzmann equation by jointly inferring the dust-corrected phase-space density and the potential. It introduces a data-driven framework that learns a dust efficiency factor and a gravitational potential using neural networks, trained on Gaia DR3 RC/RGB kinematics, and enforces the collisionless Boltzmann equation as a training objective. The method leverages Masked Autoregressive Flows to model the observed PSD, and couples this with NN parameterizations for and , regularized to maintain physical plausibility. The results show that the learned maps align with a state-of-the-art 3D dust map and enable a dust-corrected PSD , revealing a more coherent disk structure and enabling reliable dynamical inferences in the disk volume; uncertainties are carefully quantified via ensemble trainings. Overall, this approach provides a data-driven, physically constrained path to disentangle dust effects from stellar dynamics, paving the way for robust measurements of the Galactic potential and dark matter distribution, with a companion paper outlining the corresponding acceleration and mass-density inferences.

Abstract

The Boltzmann equation relates the equilibrium phase space distribution of stars in the Milky Way to the Galaxy's gravitational potential. However, observations of stellar populations are biased by extinction from foreground dust, which complicates measurements of the potential in the disk and towards the Galactic center. Using the kinematics of Red Clump and Red Branch stars in Gaia DR3, we use machine learning to simultaneously estimate both the unbiased stellar phase space density and the gravitational potential. The unbiased phase space density is obtained through a learned "dust efficiency factor" -- an observational selection function that accounts for dust extinction. The potential and the dust efficiency are parameterized by fully connected neural networks and are completely data driven. We validate the dust efficiency using a recent three-dimensional dust map in this work, and examine the potential in a companion paper.

Paper Structure

This paper contains 15 sections, 16 equations, 12 figures.

Figures (12)

  • Figure 1: Projection of Gaia DR3 RC/RGB stars within 4 kpc used for training the normalizing flows used in this analysis. The names and locations of prominent nearby dust clouds (dark nebulae (DNe) and molecular clouds (MoC)) are overlaid to contextualize the irregular and dark low-number density patches of the sky. The dark patch at Baade's window near the Parrot's Head Nebula $(\ell,b)\approx(1^\circ,-3.66^\circ)$ is an extremely crowded star field near the Galactic center. This crowding saturates Gaia's instruments, decreasing observational completeness in this region (Ref. 2021AA...649A...5F).
  • Figure 2: Absolute magnitude $M_G$ versus color $BP-RP$ of all Gaia DR3 stars within 4 kpc of the Sun with full kinematic information, photometric magnitudes, and relative parallax errors $\delta \varpi/\varpi < 0.33$. The 4 kpc completeness cut is shown as a white dashed line: stars above this line outside the hatched region are used in our MAF training sample. The Red Clump (RC) -- the largest population used in this analysis -- is outlined in a white box. Figure reproduced from the authors' previous work, Ref. lim2023mapping.
  • Figure 3: Left: CBE-derived $\epsilon$ projected in Galactic coordinates at distances $d=1$, $2$, and $3$ kpc. At larger distances, familiar features in local dust clouds become more resolved in regions where $\epsilon<1$. However, not every feature seen here should be attributed to dust extinction. For example, the dark patch that first appears at $(\ell,b,d)\approx(5^{\circ},-4^{\circ},2\:\textrm{kpc})$ is an incomplete patch in the Gaia dataset due to crowding and saturation effects. Right: Same as left, using $\epsilon_{\rm L22}$ derived from the L22 dust map Gaia2mass2022. At $3$ kpc, $\epsilon_{\rm L22}$ approaches $1$ and is no longer reliable, as it cannot restore stars dimmed beyond the limits of the Gaia spectrometer.
  • Figure 4: Left: Dust efficiency function $\epsilon$ estimated from the CBE along slices in the $x-y$ plane at $z=-0.25$ (top), $0.0$ (center), and $0.25$ kpc (bottom). The center row is a slice through the midplane of the Milky Way's disk, where the most efficiency loss is expected due to the dense network of dust clouds. Right: $\epsilon_\textrm{L22}$ estimated from the extinctions provided by the L22 dust map, limited to within $d=2.5$ kpc (black circle). Beyond $d=2.5$ kpc, $\epsilon_{\rm L22}$ cannot restore missing stars and returns to $1$.
  • Figure 5: Comparison of $\epsilon_{\rm L22}$ derived from Ref. Gaia2mass2022 (blue) and the $\epsilon$ estimated in this work (gold) along lines of sight centered on known significant dust clouds shown in Figure \ref{['fig:gaia_density_skymap']}. The center line represents an average over a cone of opening angle $\theta=10$ arcmin. The error band for the $\epsilon$ model (blue) is dominated by measurement and statistical uncertainties. Likewise, the error band of the $\epsilon_{\rm L22}$ model (gold) includes the L22 dust map errors as well as an estimate of measurement and statistical errors. Both error bands include the subdominant variation across each circular section of the cone. The light gray band in some panels indicates lines of sight where $\epsilon_{\rm L22}$ degrades closer than $d=2.5$ kpc due to heavy dust extinction.
  • ...and 7 more figures