Goal-Oriented Influence-Maximizing Data Acquisition for Learning and Optimization
Weichi Yao, Bianca Dumitrascu, Bryan R. Goldsmith, Yixin Wang
TL;DR
Goal-Oriented Influence-Maximizing Data Acquisition (GOIMDA) presents a posterior-free, uncertainty-aware framework for active data acquisition that directly targets a user-defined goal functional $\mathcal{G}$ via a first-order influence score. By incorporating an inverse-Hessian curvature preconditioner, a gradient of the goal, and candidate sensitivity, GOIMDA achieves exploration–exploitation balance without Bayesian posterior maintenance, and aligns acquisition with the specific scientific objective. Theoretical results under exponential-family models reveal a link to predictive-entropy minimization, modulated by goal alignment and prediction bias, while practical implementations use Jackknife deep ensembles and implicit inverse-Hessian vector products for scalability. Empirically, GOIMDA consistently outperforms uncertainty-based AL and GP-based BO across noisy optimization, hyperparameter tuning under distribution shift, and predictive learning tasks (MNIST, EMNIST, Rotten Tomatoes), often with substantially fewer labeled points or function evaluations. The approach offers a versatile, scalable alternative to Bayesian uncertainty-based acquisition with broad applicability to learning and optimization challenges.
Abstract
Active data acquisition is central to many learning and optimization tasks in deep neural networks, yet remains challenging because most approaches rely on predictive uncertainty estimates that are difficult to obtain reliably. To this end, we propose Goal-Oriented Influence- Maximizing Data Acquisition (GOIMDA), an active acquisition algorithm that avoids explicit posterior inference while remaining uncertainty-aware through inverse curvature. GOIMDA selects inputs by maximizing their expected influence on a user-specified goal functional, such as test loss, predictive entropy, or the value of an optimizer-recommended design. Leveraging first-order influence functions, we derive a tractable acquisition rule that combines the goal gradient, training-loss curvature, and candidate sensitivity to model parameters. We show theoretically that, for generalized linear models, GOIMDA approximates predictive-entropy minimization up to a correction term accounting for goal alignment and prediction bias, thereby, yielding uncertainty-aware behavior without maintaining a Bayesian posterior. Empirically, across learning tasks (including image and text classification) and optimization tasks (including noisy global optimization benchmarks and neural-network hyperparameter tuning), GOIMDA consistently reaches target performance with substantially fewer labeled samples or function evaluations than uncertainty-based active learning and Gaussian-process Bayesian optimization baselines.
