Table of Contents
Fetching ...

Age Predictors Through the Lens of Generalization, Bias Mitigation, and Interpretability: Reflections on Causal Implications

Debdas Paul, Elisa Ferrari, Irene Gravili, Alessandro Cellerino

Abstract

Chronological age predictors often fail to achieve out-of-distribution (OOD) gen- eralization due to exogenous attributes such as race, gender, or tissue. Learning an invariant representation with respect to those attributes is therefore essential to improve OOD generalization and prevent overly optimistic results. In predic- tive settings, these attributes motivate bias mitigation; in causal analyses, they appear as confounders; and when protected, their suppression leads to fairness. We coherently explore these concepts with theoretical rigor and discuss the scope of an interpretable neural network model based on adversarial representation learning. Using publicly available mouse transcriptomic datasets, we illustrate the behavior of this model relative to conventional machine learning models. We observe that the outcome of this model is consistent with the predictive results of a published study demonstrating the effects of Elamipretide on mouse skeletal and cardiac muscle. We conclude by discussing the limitations of deriving causal interpretation from such purely predictive models.

Age Predictors Through the Lens of Generalization, Bias Mitigation, and Interpretability: Reflections on Causal Implications

Abstract

Chronological age predictors often fail to achieve out-of-distribution (OOD) gen- eralization due to exogenous attributes such as race, gender, or tissue. Learning an invariant representation with respect to those attributes is therefore essential to improve OOD generalization and prevent overly optimistic results. In predic- tive settings, these attributes motivate bias mitigation; in causal analyses, they appear as confounders; and when protected, their suppression leads to fairness. We coherently explore these concepts with theoretical rigor and discuss the scope of an interpretable neural network model based on adversarial representation learning. Using publicly available mouse transcriptomic datasets, we illustrate the behavior of this model relative to conventional machine learning models. We observe that the outcome of this model is consistent with the predictive results of a published study demonstrating the effects of Elamipretide on mouse skeletal and cardiac muscle. We conclude by discussing the limitations of deriving causal interpretation from such purely predictive models.
Paper Structure (14 sections, 9 equations, 9 figures, 1 table)

This paper contains 14 sections, 9 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Different associations between $X$ and $Y$ as adopted from Figure 12 of buhlmann2020invariance. A marginal correlation is the weakest form association that ignores dependencies among covariates. A stronger form is the regression relevant coefficients which captures partial correlation (non-zero correlation coefficients). Causal variables are a subset of regression variables under faithfulness assumption (A must non-zero coefficient of all parents (direct cause) of $Y$. Invariance set is identifiable even when parents of $Y$ not forming a diluted notion of causality. This set is much more useful in practical applications as stated in buhlmann2020invariance. Causal form of association is desirable but often hard to achieve.
  • Figure 2: For detailed caption, see Section \ref{['fig:fig_captions']}
  • Figure 3: For detailed caption, see Section \ref{['fig:fig_captions']}
  • Figure 4: For detailed caption, see Section \ref{['fig:fig_captions']}
  • Figure 5: For detailed caption, see Section \ref{['fig:fig_captions']}
  • ...and 4 more figures