Table of Contents
Fetching ...

Constraining dark matter halo profiles with symbolic regression

Alicia Martín, Tariq Yasin, Deaglan J. Bartlett, Harry Desmond, Pedro G. Ferreira

TL;DR

This work introduces Exhaustive Symbolic Regression (ESR) to constrain dark matter halo density profiles directly from observations, using MDL to trade off fit quality and complexity. By applying ESR to mock weak-lensing Excess Surface Density data, the authors demonstrate that the standard NFW profile can be recovered in low-noise regimes, while higher noise levels favor simpler, less parameter-rich functions; a generalized NFW with a global outer slope also closely matches NFW predictions. The method is designed to be simulation-independent and extends to combining data from multiple halos via local/global parameter strategies, offering a transparent, interpretable alternative to black-box ML approaches. Beyond lensing, ESR can be extended to galactic rotation curves, enabling data-driven constraints on three-dimensional halo profiles while accounting for baryonic contributions. This framework thus provides a flexible, principled path to test halo-model assumptions and guide interpretation of upcoming high-precision surveys.

Abstract

Dark matter haloes are typically characterised by radial density profiles with fixed forms motivated by simulations (e.g. NFW). However, simulation predictions depend on uncertain dark matter physics and baryonic modelling. Here, we present a method to constrain halo density profiles directly from observations using Exhaustive Symbolic Regression (ESR), a technique that searches the space of analytic expressions for the function that best balances accuracy and simplicity for a given dataset. We test the approach on mock weak lensing excess surface density (ESD) data of synthetic clusters with NFW profiles. Motivated by real data, we assign each ESD data point a constant fractional uncertainty and vary this uncertainty and the number of clusters to probe how data precision and sample size affect model selection. For fractional errors around 5%, ESR recovers the NFW profile even from samples as small as 20 clusters. At higher uncertainties representative of current surveys, simpler functions are favoured over NFW, though it remains competitive. This preference arises because weak lensing errors are smallest in the outskirts, causing the fits to be dominated by the outer profile. ESR therefore provides a robust, simulation-independent framework both for testing mass models and determining which features of a halo's density profile are genuinely constrained by the data.

Constraining dark matter halo profiles with symbolic regression

TL;DR

This work introduces Exhaustive Symbolic Regression (ESR) to constrain dark matter halo density profiles directly from observations, using MDL to trade off fit quality and complexity. By applying ESR to mock weak-lensing Excess Surface Density data, the authors demonstrate that the standard NFW profile can be recovered in low-noise regimes, while higher noise levels favor simpler, less parameter-rich functions; a generalized NFW with a global outer slope also closely matches NFW predictions. The method is designed to be simulation-independent and extends to combining data from multiple halos via local/global parameter strategies, offering a transparent, interpretable alternative to black-box ML approaches. Beyond lensing, ESR can be extended to galactic rotation curves, enabling data-driven constraints on three-dimensional halo profiles while accounting for baryonic contributions. This framework thus provides a flexible, principled path to test halo-model assumptions and guide interpretation of upcoming high-precision surveys.

Abstract

Dark matter haloes are typically characterised by radial density profiles with fixed forms motivated by simulations (e.g. NFW). However, simulation predictions depend on uncertain dark matter physics and baryonic modelling. Here, we present a method to constrain halo density profiles directly from observations using Exhaustive Symbolic Regression (ESR), a technique that searches the space of analytic expressions for the function that best balances accuracy and simplicity for a given dataset. We test the approach on mock weak lensing excess surface density (ESD) data of synthetic clusters with NFW profiles. Motivated by real data, we assign each ESD data point a constant fractional uncertainty and vary this uncertainty and the number of clusters to probe how data precision and sample size affect model selection. For fractional errors around 5%, ESR recovers the NFW profile even from samples as small as 20 clusters. At higher uncertainties representative of current surveys, simpler functions are favoured over NFW, though it remains competitive. This preference arises because weak lensing errors are smallest in the outskirts, causing the fits to be dominated by the outer profile. ESR therefore provides a robust, simulation-independent framework both for testing mass models and determining which features of a halo's density profile are genuinely constrained by the data.

Paper Structure

This paper contains 13 sections, 17 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: An example result of the ESR fitting procedure for one mock galaxy cluster, generated with a fractional uncertainty of $f = 0.05$. Left panel: The mock weak lensing ESD data (black points) are compared to the best-fit NFW profile (red) and the next two best-fitting functions discovered by ESR, $|\theta_0|^{|\theta_1|^r}/r$ (purple) and $\left(|\theta_0|/r\right)^{|\theta_1|^{-r}}$ (blue). Right panel: The corresponding 3D density profiles ($\rho$) for the three models as a function of the intrinsic 3D radius. The vertical dotted lines mark the radial range of the ESD data from the left panel.
  • Figure 2: Optimal values of the objectives at each complexity for a fractional uncertainty of $f=0.05$. The red curve (left axis) shows the per-complexity minimum description length relative to the global best description-length (NFW in this case). The blue curve (right axis) shows the per-complexity best likelihood relative to the global best likelihood across all models. Stars mark NFW in both metrics. For both axes, lower values on the plot indicate better-performing models.
  • Figure 3: Breakdown of the total Description Length ($L(D)$) for NFW as a function of the fractional uncertainty ($f$) in the mock data. The Total $L(D)$ (black) is the sum of the Residual component ($- \log \hat{\mathcal{L}}$) (red), the Parameter length (blue) and the function complexity. The plot shows that the parameter cost (blue line) dominates the total cost at low noise and drops significantly as the data becomes less constraining. The residual term, representing the goodness-of-fit, remains relatively flat at all noise levels.
  • Figure 4: The performance of the NFW profile relative to the other ESR-discovered functions, shown as a function of the fractional uncertainty ($f$) in the mock data. Top panel: The difference in total Description Length, $\Delta L(D) = L(D)_{\mathrm{Alt}} - L(D)_{\mathrm{NFW}}$, where $L(D)_{\mathrm{Alt}}$ is the description length of the best function other than NFW (black). The equivalent difference for likelihood (red) and parameter length (blue) also shown. Positive values indicate that NFW is the best-fitting model function. Bottom panel: The absolute rank of the NFW profile in the full list of functions, where a rank of $1$ is the best. The plot shows that NFW is the $L(D)$-preferred model at low uncertainty, but it is overtaken by other functions at intermediate $f$. In the high-noise regime ($f \gtrsim 0.6$), it once again becomes one of the top-ranked models.
  • Figure 5: Change in description length ($\Delta L(D)$) between NFW and the best alternative dark matter halo model. The curves show results for $2$ different levels of Gaussian noise in the ESD data: $f \in \{0.01, \, 0.05\}$ as a function of the number of clusters. Any point higher than $0$ shows a preference for NFW.