Table of Contents
Fetching ...

Ca-MCF: Category-level Multi-label Causal Feature selection

Wanfu Gao, Yanan Wang, Yonghao Li

TL;DR

Ca-MCF utilizes label category flattening to decompose label variables into specific category nodes, enabling precise modeling of causal structures within the label space, and introduces an explanatory competition-based category-aware recovery mechanism that leverages the proposed Specific Category-Specific Mutual Information and Distinct Category-Specific Mutual Information to salvage causal features obscured by label correlations.

Abstract

Multi-label causal feature selection has attracted extensive attention in recent years. However, current methods primarily operate at the label level, treating each label variable as a monolithic entity and overlooking the fine-grained causal mechanisms unique to individual categories. To address this, we propose a Category-level Multi-label Causal Feature selection method named Ca-MCF. Ca-MCF utilizes label category flattening to decompose label variables into specific category nodes, enabling precise modeling of causal structures within the label space. Furthermore, we introduce an explanatory competition-based category-aware recovery mechanism that leverages the proposed Specific Category-Specific Mutual Information (SCSMI) and Distinct Category-Specific Mutual Information (DCSMI) to salvage causal features obscured by label correlations. The method also incorporates structural symmetry checks and cross-dimensional redundancy removal to ensure the robustness and compactness of the identified Markov Blankets. Extensive experiments across seven real-world datasets demonstrate that Ca-MCF significantly outperforms state-of-the-art benchmarks, achieving superior predictive accuracy with reduced feature dimensionality.

Ca-MCF: Category-level Multi-label Causal Feature selection

TL;DR

Ca-MCF utilizes label category flattening to decompose label variables into specific category nodes, enabling precise modeling of causal structures within the label space, and introduces an explanatory competition-based category-aware recovery mechanism that leverages the proposed Specific Category-Specific Mutual Information and Distinct Category-Specific Mutual Information to salvage causal features obscured by label correlations.

Abstract

Multi-label causal feature selection has attracted extensive attention in recent years. However, current methods primarily operate at the label level, treating each label variable as a monolithic entity and overlooking the fine-grained causal mechanisms unique to individual categories. To address this, we propose a Category-level Multi-label Causal Feature selection method named Ca-MCF. Ca-MCF utilizes label category flattening to decompose label variables into specific category nodes, enabling precise modeling of causal structures within the label space. Furthermore, we introduce an explanatory competition-based category-aware recovery mechanism that leverages the proposed Specific Category-Specific Mutual Information (SCSMI) and Distinct Category-Specific Mutual Information (DCSMI) to salvage causal features obscured by label correlations. The method also incorporates structural symmetry checks and cross-dimensional redundancy removal to ensure the robustness and compactness of the identified Markov Blankets. Extensive experiments across seven real-world datasets demonstrate that Ca-MCF significantly outperforms state-of-the-art benchmarks, achieving superior predictive accuracy with reduced feature dimensionality.
Paper Structure (34 sections, 1 theorem, 20 equations, 4 figures, 10 tables, 4 algorithms)

This paper contains 34 sections, 1 theorem, 20 equations, 4 figures, 10 tables, 4 algorithms.

Key Result

Theorem 1

In multi-label scenarios, a highly correlated label category $C_{jd}$ statistically "block" the signal of a true causal feature $X$ relative to the target $C_{ic}$, leading to the condition $SCSMI(X; C_{ic} \mid C_{jd}) < \delta_1$. $X$ is determined to have higher causal explanatory power and must

Figures (4)

  • Figure 1: Ablation study of Ca-MCF across four phases. The subfigures (a)–(g) illustrate the performance evolution from Phase 1 to Phase 4 across seven datasets. Note that (a), (c), and (d) are metrics where lower values indicate better performance ($\downarrow$), while (b), (e), (f), and (g) are metrics where higher values indicate better performance ($\uparrow$).
  • Figure 2: Parameter sensitivity analysis on VirusGO and Image datasets.
  • Figure 3: Radar charts of eight multi-label feature selection methods across seven metrics on seven datasets. The bold blue line represents the proposed Ca-MCF method.
  • Figure 4: Comparison of Ca-MCF against its rivals with the Bonferroni--Dunn test at a significance level $\alpha = 0.05$. (a) Hamming Loss. (b) Subset Accuracy. (c) Average Precision. (d) Coverage. (e) Ranking Loss. (f) Macro-F1. (g) Micro-F1.

Theorems & Definitions (8)

  • Definition 1: Label-Category Flattening
  • Definition 2: Class-Specific Causal Metrics
  • Definition 3: Category-Specific V-Structure
  • Definition 4: Class-Specific Markov Blanket Components
  • Theorem 1: Causal Blocking and Feature Recovery
  • Definition 5: Conditional Explanatory Dominance
  • Definition 6: Causal Symmetry
  • Definition 7: Cross-Dimensional Redundancy