Algorithmic Recourse with Missing Values

Kentaro Kanamori; Takuya Takagi; Ken Kobayashi; Yuichi Ike

Algorithmic Recourse with Missing Values

Kentaro Kanamori, Takuya Takagi, Ken Kobayashi, Yuichi Ike

TL;DR

This work addresses algorithmic recourse in the presence of missing input values by introducing ARMIN, a framework that integrates multiple imputation into recourse optimization. By defining the action validity over an imputation space and enforcing a probabilistic validity constraint, ARMIN uses sampling and mixed-integer linear optimization to produce valid, low-cost recourse actions for incomplete instances without revealing missing values. The approach is theoretically justified and empirically validated against baselines across several datasets and missing-data mechanisms (MCAR, MAR, MNAR), demonstrating superior action validity and lower costs. The findings advance practical AR in privacy-conscious and real-world settings where feature disclosure is limited, with implications for robustness and fairness in decision-support systems.

Abstract

This paper proposes a new framework of algorithmic recourse (AR) that works even in the presence of missing values. AR aims to provide a recourse action for altering the undesired prediction result given by a classifier. Existing AR methods assume that we can access complete information on the features of an input instance. However, we often encounter missing values in a given instance (e.g., due to privacy concerns), and previous studies have not discussed such a practical situation. In this paper, we first empirically and theoretically show the risk that a naive approach with a single imputation technique fails to obtain good actions regarding their validity, cost, and features to be changed. To alleviate this risk, we formulate the task of obtaining a valid and low-cost action for a given incomplete instance by incorporating the idea of multiple imputation. Then, we provide some theoretical analyses of our task and propose a practical solution based on mixed-integer linear optimization. Experimental results demonstrated the efficacy of our method in the presence of missing values compared to the baselines.

Algorithmic Recourse with Missing Values

TL;DR

Abstract

Paper Structure (34 sections, 7 theorems, 39 equations, 16 figures, 5 tables, 1 algorithm)

This paper contains 34 sections, 7 theorems, 39 equations, 16 figures, 5 tables, 1 algorithm.

Introduction
Related work
Preliminaries
Algorithmic recourse
Missing values
Problem formulation
Naive formulation with single imputation and its drawback
Our formulation with multiple imputation
Optimization framework
Imputation sampling
Mixed-integer linear optimization approach
Experiments
Comparison under MCAR situation
Comparison under MAR and MNAR situations
Analysis of trade-off between validity and cost
...and 19 more sections

Key Result

Proposition 1

For an instance $\bm{x} \in \mathcal{X}$ and a feature $d^\circ \in [D]$, let $\hat{\bm{x}} \in \mathcal{X}$ be its imputed instance with $\hat{x}_{d^\circ} = \mu_{d^\circ}$ and $\hat{x}_d = x_d$ for $d \in [D] \setminus \{ d^\circ \}$. For $\bm{x}$ and $\hat{\bm{x}}$, let $\bm{a}^\ast$ and $\hat{\b where $\sigma_{d^\circ}^2 = \mathbb{E}[ (x_{d^\circ} - \mu_{d^\circ})^2 ]$, $\gamma = \mathbb{E}[ \

Figures (16)

Figure 1: Examples of an original instance $\bm{x}$, its imputed instance $\hat{\bm{x}}$, and the decision boundary of a classifier $h$. Here, we drop the feature "Income" in $\bm{x}$ as a missing feature and obtain $\hat{\bm{x}}$ by imputing its value with the empirical mean $66K. For the imputed instance $\hat{\bm{x}}$, we obtain an optimal action $\bm{a}$ using the existing AR method. While the action $\bm{a}$ successfully alters the prediction result of $h$ for the imputed instance $\hat{\bm{x}}$, it fails to do that for the original instance $\bm{x}$.
Figure 2: Experimental results of our baseline comparison under the MCAR situation, where $D_{\ast} = 2$. The x-axis (resp. y-axis) represents the valid ratio (resp. average cost). Compared to the baselines, our ARMIN attained good balances between the valid ratio and cost for almost all the datasets and classifiers; that is, it achieved higher valid ratios than ImputationAR and lower costs than RubustAR.
Figure 3: Experimental results of our comparison under the MAR and MNAR situations.
Figure 4: Experimental results of our trade-off analyses on the GiveMeCredit dataset. (a) Experimental results of the sensitivity analyses of the confidence parameter $\rho$ with $95\%$ confidence intervals. (b) Experimental results of the path analyses under the MAR situation. Here, we selected two examples, and additional examples are shown in \ref{['sec:appendix:experiments']}.
Figure 5: Experimental results of the XGBoost classifier with missing values $D_{\ast} \in \{ 0, 1, 2, 3, 4 \}$ under the MCAR situation.
...and 11 more figures

Theorems & Definitions (15)

Remark 1
Remark 2
Proposition 1
Proposition 2
Proposition 3
Lemma 1: Closed-form Optimal Action Ustun:FAT*2019Pawelczyk:AISTATS2022
proof : Proof of \ref{['prop:upper']}
Proposition 4
proof
proof : Proof of \ref{['prop:sample']}
...and 5 more

Algorithmic Recourse with Missing Values

TL;DR

Abstract

Algorithmic Recourse with Missing Values

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (15)