Table of Contents
Fetching ...

CIDR: A Cooperative Integrated Dynamic Refining Method for Minimal Feature Removal Problem

Qian Chen, Taolin Zhang, Dongyang Li, Xiaofeng He

TL;DR

CIDR addresses the minimal feature removal problem in NLP by incorporating Cooperative Integrated Gradients (CIG) to detect feature interactions and reframing the task as a knapsack optimization. A Dynamic Programming-based Minimal Feature Refinement algorithm generates multiple candidate pair-based minimal feature sets, which are filtered through a frequency-based threshold to yield robust MFS. The approach is evaluated on SST2, IMDB, and Rotten Tomatoes across BERT, DistilBERT, and RoBERTa, showing improved interpretability (Comp, LO) and a higher Feature Minimality Score (FMS) compared with baselines. The work demonstrates CIDR's potential to produce semantically representative and robust explanations while exposing possible biases in NLP datasets.

Abstract

The minimal feature removal problem in the post-hoc explanation area aims to identify the minimal feature set (MFS). Prior studies using the greedy algorithm to calculate the minimal feature set lack the exploration of feature interactions under a monotonic assumption which cannot be satisfied in general scenarios. In order to address the above limitations, we propose a Cooperative Integrated Dynamic Refining method (CIDR) to efficiently discover minimal feature sets. Specifically, we design Cooperative Integrated Gradients (CIG) to detect interactions between features. By incorporating CIG and characteristics of the minimal feature set, we transform the minimal feature removal problem into a knapsack problem. Additionally, we devise an auxiliary Minimal Feature Refinement algorithm to determine the minimal feature set from numerous candidate sets. To the best of our knowledge, our work is the first to address the minimal feature removal problem in the field of natural language processing. Extensive experiments demonstrate that CIDR is capable of tracing representative minimal feature sets with improved interpretability across various models and datasets.

CIDR: A Cooperative Integrated Dynamic Refining Method for Minimal Feature Removal Problem

TL;DR

CIDR addresses the minimal feature removal problem in NLP by incorporating Cooperative Integrated Gradients (CIG) to detect feature interactions and reframing the task as a knapsack optimization. A Dynamic Programming-based Minimal Feature Refinement algorithm generates multiple candidate pair-based minimal feature sets, which are filtered through a frequency-based threshold to yield robust MFS. The approach is evaluated on SST2, IMDB, and Rotten Tomatoes across BERT, DistilBERT, and RoBERTa, showing improved interpretability (Comp, LO) and a higher Feature Minimality Score (FMS) compared with baselines. The work demonstrates CIDR's potential to produce semantically representative and robust explanations while exposing possible biases in NLP datasets.

Abstract

The minimal feature removal problem in the post-hoc explanation area aims to identify the minimal feature set (MFS). Prior studies using the greedy algorithm to calculate the minimal feature set lack the exploration of feature interactions under a monotonic assumption which cannot be satisfied in general scenarios. In order to address the above limitations, we propose a Cooperative Integrated Dynamic Refining method (CIDR) to efficiently discover minimal feature sets. Specifically, we design Cooperative Integrated Gradients (CIG) to detect interactions between features. By incorporating CIG and characteristics of the minimal feature set, we transform the minimal feature removal problem into a knapsack problem. Additionally, we devise an auxiliary Minimal Feature Refinement algorithm to determine the minimal feature set from numerous candidate sets. To the best of our knowledge, our work is the first to address the minimal feature removal problem in the field of natural language processing. Extensive experiments demonstrate that CIDR is capable of tracing representative minimal feature sets with improved interpretability across various models and datasets.
Paper Structure (20 sections, 21 equations, 2 figures, 6 tables, 3 algorithms)

This paper contains 20 sections, 21 equations, 2 figures, 6 tables, 3 algorithms.

Figures (2)

  • Figure 1: An illustration of the minimal feature removal problem. The bottom part shows that removing the feature of MFS would cause a drastic shift in model output probability.
  • Figure 2: CIDR method (components inside the blue box): Our method workflow is illustrated at the bottom of the diagram above. Firstly, we generate word pairs by combining every two words from the input sentence. Next, we calculate the cooperative integrated gradients (e.g. $CIG_1$,$CIG_2$) for each pair (e.g.$p_1$,$p_2$). Then, we estimate the upper bound of the minimum feature set by applying perturbation variables (e.g. $v_1$,$v_2$) and resolve the transformed knapsack problem using a dynamic programming algorithm, resulting in multiple candidate sets. Finally, we filter out the "false positive" minimal features by comparing the frequencies (e.g. $f_1$,$f_2$) with the threshold $\varepsilon$.

Theorems & Definitions (3)

  • proof
  • proof
  • proof