T-COL: Generating Counterfactual Explanations for General User Preferences on Variable Machine Learning Systems
Ming Wang, Daling Wang, Wenfang Wu, Shi Feng, Yifei Zhang
TL;DR
This work tackles interpretability with counterfactual explanations under two practical challenges: general user preferences and non-static ML systems. It introduces T-COL, an instance-based method that builds local greedy trees from a query and prototype cases to generate CEs that align with general user preferences through prototype screening and local optimization rules. The authors map preferences to CE properties, simulate users with LLM-based agents, and show that T-COL outperforms baselines in adaptability and robustness across five benchmark datasets, while achieving significant efficiency gains. The results suggest that robust, user-aligned CEs are feasible in dynamic ML deployments, enabling actionable explanations that remain valid as models evolve.
Abstract
To address the interpretability challenge in machine learning (ML) systems, counterfactual explanations (CEs) have emerged as a promising solution. CEs are unique as they provide workable suggestions to users, instead of explaining why a certain outcome was predicted. The application of CEs encounters two main challenges: general user preferences and variable ML systems. On one hand, user preferences for specific values can vary depending on the task and scenario. On the other hand, the ML systems for verification may change while the CEs are performed. Thus, user preferences tend to be general rather than specific, and CEs need to be adaptable to variable ML models while maintaining robustness even as these models change. Facing these challenges, we propose general user preferences based on insights from psychology and behavioral science, and add the challenge of non-static ML systems as one preference. Moreover, we introduce a novel method, \uline{T}ree-based \uline{C}onditions \uline{O}ptional \uline{L}inks (T-COL) for generating CEs adaptable to general user preferences. Moreover, we employ T-COL to enhance the robustness of CEs with specific conditions, making CEs robust even when the ML models are replaced. To assess subjectivity preferences, we define LLM-based autonomous agents to simulate users and align them with real users. Experiments show that T-COL outperforms all baselines in adapting to general user preferences.
