Table of Contents
Fetching ...

Interpreting deep learning-based stellar mass estimation via causal analysis and mutual information decomposition

Wei Zhang, Qiufan Lin, Yuan-Sen Ting, Shupei Chen, Hengxin Ruan, Song Li, Yifan Wang

TL;DR

This study develops a dual-interpretability framework to dissect deep learning–based stellar mass estimation from multi-band galaxy data. By separating causal paths (via a latent $S$ space learned through supervised contrastive learning and KNN local tests) from multivariate information distribution (via mutual information decomposition into redundant, unique, and synergistic components), the authors quantify how morphology, photometry, images, and spec-$z$ contribute to predicting $M_\ast$ beyond traditional integrated photometry. They demonstrate that image-derived morphology provides meaningful, partly spec-$z$–explainable information, and that optical images can diminish the incremental value of infrared photometry in the SDSS+WISE context, while also revealing strong across-band and intra-band synergies especially involving the $g$-band. The findings offer concrete interpretability for image-based astrophysical inferences and guidance for optimizing data usage in large-scale surveys. Overall, the work advances the integration of deep learning with principled causal and information-theoretic tools to illuminate complex multivariate physical processes in galaxy evolution.

Abstract

End-to-end deep learning models fed with multi-band galaxy images are powerful data-driven tools used to estimate galaxy physical properties in the absence of spectroscopy. However, due to a lack of interpretability and the associational nature of such models, it is difficult to understand how the information that is included in addition to integrated photometry (e.g., morphology) contributes to the estimation task. Improving our understanding in this field would enable further advances into unraveling the physical connections among galaxy properties and optimizing data exploitation. Therefore, our work is aimed at interpreting the deep learning-based estimation of stellar mass via two interpretability techniques: causal analysis and mutual information decomposition. The former reveals the causal paths between multiple variables beyond nondirectional statistical associations, while the latter quantifies the multicomponent contributions (i.e., redundant, unique, and synergistic) of different input data to the stellar mass estimation. Using data from the Sloan Digital Sky Survey (SDSS) and the Wide-field Infrared Survey Explorer (WISE), we obtained meaningful results that provide physical interpretations for image-based models. Our work demonstrates the gains from combining deep learning with interpretability techniques, and holds promise in promoting more data-driven astrophysical research (e.g., astrophysical parameter estimations and investigations on complex multivariate physical processes).

Interpreting deep learning-based stellar mass estimation via causal analysis and mutual information decomposition

TL;DR

This study develops a dual-interpretability framework to dissect deep learning–based stellar mass estimation from multi-band galaxy data. By separating causal paths (via a latent space learned through supervised contrastive learning and KNN local tests) from multivariate information distribution (via mutual information decomposition into redundant, unique, and synergistic components), the authors quantify how morphology, photometry, images, and spec- contribute to predicting beyond traditional integrated photometry. They demonstrate that image-derived morphology provides meaningful, partly spec-–explainable information, and that optical images can diminish the incremental value of infrared photometry in the SDSS+WISE context, while also revealing strong across-band and intra-band synergies especially involving the -band. The findings offer concrete interpretability for image-based astrophysical inferences and guidance for optimizing data usage in large-scale surveys. Overall, the work advances the integration of deep learning with principled causal and information-theoretic tools to illuminate complex multivariate physical processes in galaxy evolution.

Abstract

End-to-end deep learning models fed with multi-band galaxy images are powerful data-driven tools used to estimate galaxy physical properties in the absence of spectroscopy. However, due to a lack of interpretability and the associational nature of such models, it is difficult to understand how the information that is included in addition to integrated photometry (e.g., morphology) contributes to the estimation task. Improving our understanding in this field would enable further advances into unraveling the physical connections among galaxy properties and optimizing data exploitation. Therefore, our work is aimed at interpreting the deep learning-based estimation of stellar mass via two interpretability techniques: causal analysis and mutual information decomposition. The former reveals the causal paths between multiple variables beyond nondirectional statistical associations, while the latter quantifies the multicomponent contributions (i.e., redundant, unique, and synergistic) of different input data to the stellar mass estimation. Using data from the Sloan Digital Sky Survey (SDSS) and the Wide-field Infrared Survey Explorer (WISE), we obtained meaningful results that provide physical interpretations for image-based models. Our work demonstrates the gains from combining deep learning with interpretability techniques, and holds promise in promoting more data-driven astrophysical research (e.g., astrophysical parameter estimations and investigations on complex multivariate physical processes).

Paper Structure

This paper contains 29 sections, 10 equations, 24 figures, 4 tables.

Figures (24)

  • Figure 1: Distributions of stellar mass, $r$-band magnitude and spec-$z$ for the SDSS data used in our work, shown for star-forming, passive, and other galaxies.
  • Figure 2: Causal analysis and mutual information decomposition methods adopted in this work. Upper panel: Causal graph that represents the stellar mass-predicting process of an end-to-end deep learning model. Each node refers to a variable or a set of variables. Each arrow represents a causal link. $\mathbf{X}^{in}$ refers to the set of input data to the model. $Y$ refers to the target variable (i.e., stellar mass in this work). $\mathbf{X}^{ex}$ refers to the set of external variables that contain the information on stellar mass but missing in the input data. $S$ refers to the low-dimensional latent vector that encodes the information on stellar mass extracted from the input data, the intermediary variable between $\mathbf{X}^{in}$ and $Y$. The line between $\mathbf{X}^{ex}$ and $\mathbf{X}^{in}$ that has no direction specified refers to their possible dependence, which is not necessarily a direct causal link. There may be inner structures between the individual variables in the set $\mathbf{X}^{ex}$ and $Y$, shown by the exemplar variables $X^{ex}_1$, $X^{ex}_2$, and $X^{ex}_3$. The undirected lines between $X^{ex}_1$ and $X^{ex}_2$ and between $X^{ex}_1$ and $X^{ex}_3$ refer to their possible undirected dependences. Lower panel: Diagram of the decomposition of mutual information between the target $Y$ and two sets of input data $\mathbf{X}_1$, $\mathbf{X}_2$. $Redundant(Y; \mathbf{X}_1, \mathbf{X}_2)$ refers to the redundant information on $Y$ that both $\mathbf{X}_1$ and $\mathbf{X}_2$ can provide. $Unique(Y; \mathbf{X}_1)$ and $Unique(Y; \mathbf{X}_2)$ refer to the unique information that only $\mathbf{X}_1$ or $\mathbf{X}_2$ can provide. $Synergistic(Y; \mathbf{X}_1, \mathbf{X}_2)$ refers to the synergistic information that exists only when both $\mathbf{X}_1$ and $\mathbf{X}_2$ are available.
  • Figure 3: Distributions of local correlations between stellar mass and representative parameters for the five photometry-only or image-based models defined in Table \ref{['tab:models_causal']}. Each column corresponds a parameter, and each row corresponds to a model. The original distributions are separately shown for star-forming, passive, and other galaxies from the test sample, illustrated as the colored curves. The distributions shown in grey are used as a contrast, produced by randomly permuting the stellar mass values within the nearest neighbors of each test galaxy. Deviations between the original and reference correlation distributions indicate external parameters for a given model. Primarily, optical photometry cannot entirely account for morphological information, infrared information, spec-$z$, and physical information related to stellar mass; while multi-band images can encompass intra- and cross-band morphological features that are both important for the stellar mass estimation.
  • Figure 4: Predictive efficiency of representative parameters for the five photometry-only or image-based models defined in Table \ref{['tab:models_causal']}. Each data point corresponds to the 50th of the predictive efficiency distribution over a galaxy population from the test sample (i.e., star-forming, passive, and other galaxies), and each error bar indicates the 16th and 84th percentiles. The black dotted lines indicate the reference value of the median predictive efficiency ($\sim 0.013$) estimated by randomly permuting the stellar mass values within the nearest neighbors of each test galaxy. The predictive efficiency reveals the same trends as in Fig. \ref{['fig:correlation_selected']}, and is more indicative of the impact of each variable on the stellar mass estimation.
  • Figure 5: Conditional predictive efficiency of representative parameters for the photometry-only model $\mathbf{M}_{ugriz}$ defined in Table \ref{['tab:models_causal']}. Each data point corresponds to the 50th of the (conditional) predictive efficiency distribution over a galaxy population from the test sample (i.e., star-forming, passive, and other galaxies), and each error bar indicates the 16th and 84th percentiles. In each row (corresponding to a parameter), the first triplet of data points shows the unconditional predictive efficiency to be compared with, and each of the remaining triplets shows the conditional predictive efficiency with the conditional variable labeled on the bottom. All the parameters including stellar mass are first conditioned on the $g-r$ color of the nearest neighbors of each test galaxy before computing the (conditional) predictive efficiency. The black dotted lines indicate the reference value of the median predictive efficiency ($\sim 0.013$) estimated by randomly permuting the stellar mass values within the nearest neighbors of each test galaxy. For better comparison, the blue, red, and green dotted lines indicate the three 50th percentiles (corresponding to the three galaxy populations) for the unconditional predictive efficiency. Based on the contrast between the conditional and unconditional predictive efficiency values, we mainly see that the contributions of $W1-W2$, $[b/a]_g$, $n_g/n_r$, and $R_{90,g}/R_{50,g}$ to the stellar mass estimation can be largely explained by spec-$z$, whereas the contributions of the other parameters such as $R_{50,g}/R_{50,r}$ for star-forming galaxies is essentially unexplained by spec-$z$. Furthermore, the contributions of all the morphological parameters cannot be fully explained by $W1-W2$, and vice versa.
  • ...and 19 more figures