Adaptive Bounded Exploration and Intermediate Actions for Data Debiasing
Yifan Yang, Yang Liu, Parinaz Naghizadeh
TL;DR
The paper tackles data biases in training datasets for sequential decision rules under censored feedback by introducing an adaptive bounded exploration framework that debiases population statistics while controlling exploration costs.The core method combines exploitation with a bounded exploration bound ${LB}_t$, plus an optional intermediate (noisy) action, modeled via a two-stage MDP to analyze trade-offs between debiasing speed and exploration cost.Theoretical results show that the active debiasing algorithm recovers unbiased estimates for unimodal distributions, provides finite-sample error bounds relative to an oracle, and describes interactions with fairness interventions, including how EO or same-rule constraints affect debiasing speed.Numerical experiments on synthetic and real-world datasets (Adult, Retiring Adult, FICO) demonstrate improved accuracy and equality-of-opportunity, validate the bounded-exploration approach, and reveal practical trade-offs when incorporating intermediate actions.Overall, the work advances data-driven fair decision-making by enabling data collection strategies that reduce bias without prohibitive exploration costs, with potential applicability to supervised learning systems facing distribution shift and censored feedback.
Abstract
The performance of algorithmic decision rules is largely dependent on the quality of training datasets available to them. Biases in these datasets can raise economic and ethical concerns due to the resulting algorithms' disparate treatment of different groups. In this paper, we propose algorithms for sequentially debiasing the training dataset through adaptive and bounded exploration in a classification problem with costly and censored feedback. Our proposed algorithms balance between the ultimate goal of mitigating the impacts of data biases -- which will in turn lead to more accurate and fairer decisions, and the exploration risks incurred to achieve this goal. Specifically, we propose adaptive bounds to limit the region of exploration, and leverage intermediate actions which provide noisy label information at a lower cost. We analytically show that such exploration can help debias data in certain distributions, investigate how {algorithmic fairness interventions} can work in conjunction with our proposed algorithms, and validate the performance of these algorithms through numerical experiments on synthetic and real-world data.
