Accelerated Aggregated D-Optimal Designs for Estimating Main Effects in Black-Box Models
Chih-Yu Chang, Ming-Chung Chang
TL;DR
This work tackles the challenge of robust, model-agnostic estimation of main effects for black-box predictors, especially under feature correlation. It introduces A2D2E, an accelerated aggregated D-Optimal Designs estimator that preserves ALE localization while using D-optimal design to estimate local slopes, yielding improved variance properties and consistency without requiring differentiability. Theoretical results establish variance reduction and consistency, and extensive simulations plus real-data and LLM-based case studies demonstrate that A2D2E outperforms PD and ALE, particularly in correlated settings. The approach offers practical, scalable interpretability for modern ML applications, including neural networks, Gaussian processes, and language-model surrogates, with broad applicability to real-world decision making.
Abstract
Recent advances in supervised learning have driven growing interest in explaining black-box models, particularly by estimating the effects of input variables on model predictions. However, existing approaches often face key limitations, including poor scalability, sensitivity to out-of-distribution sampling, and instability under correlated features. To address these issues, we propose A2D2E, an $\textbf{E}$stimator based on $\textbf{A}$ccelerated $\textbf{A}$ggregated $\textbf{D}$-Optimal $\textbf{D}$esigns. Our method leverages principled experimental design to improve efficiency and robustness in main effect estimation. We establish theoretical guarantees, including convergence and variance reduction, and validate A2D2E through extensive simulations. We further provide the potential of the proposed method with a case study on real data and applications in language models. The code to reproduce the results can be found at https://github.com/cchihyu/A2D2E.
