Algorithmic Recourse with Missing Values
Kentaro Kanamori, Takuya Takagi, Ken Kobayashi, Yuichi Ike
TL;DR
This work addresses algorithmic recourse in the presence of missing input values by introducing ARMIN, a framework that integrates multiple imputation into recourse optimization. By defining the action validity over an imputation space and enforcing a probabilistic validity constraint, ARMIN uses sampling and mixed-integer linear optimization to produce valid, low-cost recourse actions for incomplete instances without revealing missing values. The approach is theoretically justified and empirically validated against baselines across several datasets and missing-data mechanisms (MCAR, MAR, MNAR), demonstrating superior action validity and lower costs. The findings advance practical AR in privacy-conscious and real-world settings where feature disclosure is limited, with implications for robustness and fairness in decision-support systems.
Abstract
This paper proposes a new framework of algorithmic recourse (AR) that works even in the presence of missing values. AR aims to provide a recourse action for altering the undesired prediction result given by a classifier. Existing AR methods assume that we can access complete information on the features of an input instance. However, we often encounter missing values in a given instance (e.g., due to privacy concerns), and previous studies have not discussed such a practical situation. In this paper, we first empirically and theoretically show the risk that a naive approach with a single imputation technique fails to obtain good actions regarding their validity, cost, and features to be changed. To alleviate this risk, we formulate the task of obtaining a valid and low-cost action for a given incomplete instance by incorporating the idea of multiple imputation. Then, we provide some theoretical analyses of our task and propose a practical solution based on mixed-integer linear optimization. Experimental results demonstrated the efficacy of our method in the presence of missing values compared to the baselines.
