Table of Contents
Fetching ...

How to evaluate the sufficiency and complementarity of summary statistics for cosmic fields: an information-theoretic perspective

Ce Sui, Yi Mao, Xiaosheng Zhao, Tao Jing, Benjamin D. Wandelt

TL;DR

The paper addresses how to quantify information content and complementarity of summary statistics for cosmic fields using mutual information $I(\theta;x)$ and conditional MI. It introduces a principled framework for information sufficiency and decomposition into shared and complementary information, with variational MI estimation using a flexible model $q(\theta|s)$ (e.g., masked autoregressive flow). The authors validate the approach on a Gaussian CMB-like field where the power spectrum $P(k)$ captures essentially all information, and on non-Gaussian 21 cm maps where the wavelet scattering transform (ST) provides the most information and substantial complementarity to $P(k)$ and the bispectrum. They demonstrate that MI can guide the design and evaluation of summaries, and discuss practical estimation challenges in high-dimensional field data, pointing toward learning summaries that maximize MI.

Abstract

The advent of increasingly advanced surveys and cosmic tracers has motivated the development of new inference techniques and novel approaches to extracting information from cosmic fields. A central challenge in this endeavor is to quantify the information content carried by these summary statistics in cosmic fields. In particular, how should we assess which statistics are more informative than others and assess the exact degree of complementarity of the information from each statistic? Here, we introduce mutual information (MI) that provides, from an information-theoretic perspective, a natural framework for assessing the sufficiency and complementarity of summary statistics in cosmological data. We demonstrate how MI can be applied to typical inference tasks to make information-theoretic evaluations, using two representative examples: the cosmic microwave background map, from which the power spectrum extracts almost all information as is expected for a Gaussian random field, and the 21~cm brightness temperature map, from which the scattering transform extracts the most non-Gaussian information but is complementary to power spectrum and bispectrum. Our results suggest that MI offers a robust theoretical foundation for evaluating and improving summaries, thereby enabling a deeper understanding of cosmic fields from an information-theoretic perspective.

How to evaluate the sufficiency and complementarity of summary statistics for cosmic fields: an information-theoretic perspective

TL;DR

The paper addresses how to quantify information content and complementarity of summary statistics for cosmic fields using mutual information and conditional MI. It introduces a principled framework for information sufficiency and decomposition into shared and complementary information, with variational MI estimation using a flexible model (e.g., masked autoregressive flow). The authors validate the approach on a Gaussian CMB-like field where the power spectrum captures essentially all information, and on non-Gaussian 21 cm maps where the wavelet scattering transform (ST) provides the most information and substantial complementarity to and the bispectrum. They demonstrate that MI can guide the design and evaluation of summaries, and discuss practical estimation challenges in high-dimensional field data, pointing toward learning summaries that maximize MI.

Abstract

The advent of increasingly advanced surveys and cosmic tracers has motivated the development of new inference techniques and novel approaches to extracting information from cosmic fields. A central challenge in this endeavor is to quantify the information content carried by these summary statistics in cosmic fields. In particular, how should we assess which statistics are more informative than others and assess the exact degree of complementarity of the information from each statistic? Here, we introduce mutual information (MI) that provides, from an information-theoretic perspective, a natural framework for assessing the sufficiency and complementarity of summary statistics in cosmological data. We demonstrate how MI can be applied to typical inference tasks to make information-theoretic evaluations, using two representative examples: the cosmic microwave background map, from which the power spectrum extracts almost all information as is expected for a Gaussian random field, and the 21~cm brightness temperature map, from which the scattering transform extracts the most non-Gaussian information but is complementary to power spectrum and bispectrum. Our results suggest that MI offers a robust theoretical foundation for evaluating and improving summaries, thereby enabling a deeper understanding of cosmic fields from an information-theoretic perspective.

Paper Structure

This paper contains 16 sections, 28 equations, 5 figures.

Figures (5)

  • Figure 1: Decomposition of information content in a raw data $x$. $I(\theta;x)$ denotes the full field-level information. The left column (blue) $I(\theta;s)$ represents the information extracted by the summary $s$, and the right column (orange) $I(\theta ;x \mid s)$ captures the residual information beyond $s$. Similarly, the upper row (dark) $I(\theta ; t)$ shows the information extracted by another summary $t$, and the lower row (light) $I(\theta ;x \mid t)$ captures the residual information beyond $t$. The upper right (dark orange) panel $I(\theta ; t \mid s)$ quantifies the complementary information that $t$ contributes beyond $s$, and the lower left (light blue) panel $I(\theta ; s \mid t)$ quantifies the complementary information that $s$ contributes beyond $t$. The upper left (dark blue) panel quantifies the common information shared by both $s$ and $t$, and the lower right (light orange) panel $I(\theta ; x \mid s, t)$ quantifies the residual information in the raw data $x$ beyond both $s$ and $t$.
  • Figure 2: Left: a CMB-like map generated from a given PS. Right: a fixed square patch extracted from the map, used for computing summary statistics.
  • Figure 3: The MI of the PS (blue), ST (green) and BS (orange) for CMB-like GRFs. In the inset, the conditional MI of the ST (BS) given PS is shown, respectively, which measures the complementary information that the ST (BS) contributes beyond the PS.
  • Figure 4: Illustration of the 21 cm maps. Left: cosmic 21 cm signal from the EoR. Right: mock observation (cosmic 21 cm signal with telescope noise) with SKA 1,000h observation.
  • Figure 5: Venn diagram for the MI of summary statistics for the 21 cm maps from the EoR. We consider three summaries, PS (blue), ST (green) and BS (orange), and visualize their respective MI value (as marked) by the scaled circle size in the legend. We consider two cases, (left) the 21 cm map with only cosmic EoR signal, and (right) mock observations with 21 cm EoR signal and telescope noise with SKA 1,000h observation. For each case, we visualize the conditional MI (as marked with arrows) by the non-overlapped regions between two circles in Venn diagram.