Table of Contents
Fetching ...

Partial Information Decomposition for Data Interpretability and Feature Selection

Charles Westphal, Stephen Hailes, Mirco Musolesi

TL;DR

PIDF introduces Partial Information Decomposition of Features, a per-feature decomposition that yields $I(Y;F_i)$, $FWS(Y;F_i;F_-i)$, and $FWR(Y;F_i;F_-i)$ to enable simultaneous data interpretability and feature selection. By replacing intractable full PID terms with a tractable, per-feature view and a Theta-based measure, PIDF can distinguish redundant versus synergistic contributions and identify minimal informative subsets. The framework includes a practical algorithm for computing per-feature metrics and a feature-selection procedure, with extensive validation on synthetic benchmarks and real data from genetics and neuroscience. The results demonstrate improved interpretability and reliable feature selection in settings with complex higher-order interactions, with potential impact across domains requiring nuanced understanding of feature interactions.

Abstract

In this paper, we introduce Partial Information Decomposition of Features (PIDF), a new paradigm for simultaneous data interpretability and feature selection. Contrary to traditional methods that assign a single importance value, our approach is based on three metrics per feature: the mutual information shared with the target variable, the feature's contribution to synergistic information, and the amount of this information that is redundant. In particular, we develop a novel procedure based on these three metrics, which reveals not only how features are correlated with the target but also the additional and overlapping information provided by considering them in combination with other features. We extensively evaluate PIDF using both synthetic and real-world data, demonstrating its potential applications and effectiveness, by considering case studies from genetics and neuroscience.

Partial Information Decomposition for Data Interpretability and Feature Selection

TL;DR

PIDF introduces Partial Information Decomposition of Features, a per-feature decomposition that yields , , and to enable simultaneous data interpretability and feature selection. By replacing intractable full PID terms with a tractable, per-feature view and a Theta-based measure, PIDF can distinguish redundant versus synergistic contributions and identify minimal informative subsets. The framework includes a practical algorithm for computing per-feature metrics and a feature-selection procedure, with extensive validation on synthetic benchmarks and real data from genetics and neuroscience. The results demonstrate improved interpretability and reliable feature selection in settings with complex higher-order interactions, with potential impact across domains requiring nuanced understanding of feature interactions.

Abstract

In this paper, we introduce Partial Information Decomposition of Features (PIDF), a new paradigm for simultaneous data interpretability and feature selection. Contrary to traditional methods that assign a single importance value, our approach is based on three metrics per feature: the mutual information shared with the target variable, the feature's contribution to synergistic information, and the amount of this information that is redundant. In particular, we develop a novel procedure based on these three metrics, which reveals not only how features are correlated with the target but also the additional and overlapping information provided by considering them in combination with other features. We extensively evaluate PIDF using both synthetic and real-world data, demonstrating its potential applications and effectiveness, by considering case studies from genetics and neuroscience.
Paper Structure (46 sections, 19 equations, 13 figures, 1 table, 2 algorithms)

This paper contains 46 sections, 19 equations, 13 figures, 1 table, 2 algorithms.

Figures (13)

  • Figure 1: PIDF at a glance. The diagram on the left shows the interactions that characterize how feature $F_i$ interacts with the remaining features $(F_j, F_k)$ to describe the target. The bar graph on the right shows how PIDF can be used to represent these quantities in an interpretable manner.
  • Figure 2: Comparison of feature importance indicators using synthetic datasets.
  • Figure 3: Comparison of feature importance indicators applied to the California housing dataset.
  • Figure 4: Gene importance in the BRCA dataset.
  • Figure 5: Average FWR, FWS, and MI per neuron as synaptic connections are re-established.
  • ...and 8 more figures