Table of Contents
Fetching ...

Causal Feature Selection for Responsible Machine Learning

Raha Moraffah, Paras Sheth, Saketh Vishnubhatla, Huan Liu

TL;DR

The paper addresses enabling responsible machine learning by focusing on causal feature selection to separate causation from correlation in high-stakes decisions. It formalizes the idea around the Markov Blanket $MB(Y)$ of the outcome, noting that $Y \perp X \setminus MB(Y) \mid MB(Y)$ and that $MB(Y)$ contains the most informative causal features. A taxonomy links causal feature selection to four responsible ML tasks—interpretability, fairness, adversarial robustness, and domain generalization—with a survey of algorithms. It highlights future directions such as scalable causal feature selection, methods under partial causal knowledge, and integrating multimodal data to improve robustness and fairness.

Abstract

Machine Learning (ML) has become an integral aspect of many real-world applications. As a result, the need for responsible machine learning has emerged, focusing on aligning ML models to ethical and social values, while enhancing their reliability and trustworthiness. Responsible ML involves many issues. This survey addresses four main issues: interpretability, fairness, adversarial robustness, and domain generalization. Feature selection plays a pivotal role in the responsible ML tasks. However, building upon statistical correlations between variables can lead to spurious patterns with biases and compromised performance. This survey focuses on the current study of causal feature selection: what it is and how it can reinforce the four aspects of responsible ML. By identifying features with causal impacts on outcomes and distinguishing causality from correlation, causal feature selection is posited as a unique approach to ensuring ML models to be ethically and socially responsible in high-stakes applications.

Causal Feature Selection for Responsible Machine Learning

TL;DR

The paper addresses enabling responsible machine learning by focusing on causal feature selection to separate causation from correlation in high-stakes decisions. It formalizes the idea around the Markov Blanket of the outcome, noting that and that contains the most informative causal features. A taxonomy links causal feature selection to four responsible ML tasks—interpretability, fairness, adversarial robustness, and domain generalization—with a survey of algorithms. It highlights future directions such as scalable causal feature selection, methods under partial causal knowledge, and integrating multimodal data to improve robustness and fairness.

Abstract

Machine Learning (ML) has become an integral aspect of many real-world applications. As a result, the need for responsible machine learning has emerged, focusing on aligning ML models to ethical and social values, while enhancing their reliability and trustworthiness. Responsible ML involves many issues. This survey addresses four main issues: interpretability, fairness, adversarial robustness, and domain generalization. Feature selection plays a pivotal role in the responsible ML tasks. However, building upon statistical correlations between variables can lead to spurious patterns with biases and compromised performance. This survey focuses on the current study of causal feature selection: what it is and how it can reinforce the four aspects of responsible ML. By identifying features with causal impacts on outcomes and distinguishing causality from correlation, causal feature selection is posited as a unique approach to ensuring ML models to be ethically and socially responsible in high-stakes applications.
Paper Structure (8 sections, 15 equations, 2 figures)

This paper contains 8 sections, 15 equations, 2 figures.

Figures (2)

  • Figure 1: Relevant, irrelevant, redundant/spurious features are annotated in a causal graph.
  • Figure 2: Taxonomy of the Responsible ML tasks in terms of usage of the causal feature selection.