LFaB: Low fidelity as Bias for Active Learning in the chemical configuration space
Vivin Vinod, Peter Zaspel
TL;DR
The paper tackles the inefficiency of variance-driven active learning in quantum-chemical surrogate modeling by introducing Low-Fidelity-as-Bias (LFaB), a bias-based sampling strategy that uses low-fidelity labels to approximate high-fidelity bias. LFaB selects samples with the largest predicted bias, achieving substantial reductions in required high-fidelity evaluations across QM7b atomization energies, VIB5 ab initio PES, and QeMFi excitation energies. In benchmarks, LFaB outperforms standard variance-based AL and often matches the greedy-optimal selection, reducing training data needs by up to an order of magnitude and enabling cost-effective, high-accuracy quantum-chemical models. The approach is simple to implement and leverages existing multifidelity concepts, offering a practical tool for efficient computational chemistry workflows.
Abstract
Active learning promises to provide an optimal training sample selection procedure in the construction of machine learning models. It often relies on minimizing the model's variance, which is assumed to decrease the prediction error. Still, it is frequently even less efficient than pure random sampling. Motivated by the bias-variance decomposition, we propose to minimize the model's bias instead of its variance. By doing so, we are able to almost exactly match the best-case error over all possible greedy sample selection procedures for a relevant application. Our bias approximation is based on using cheap to calculate low fidelity data as known from $Δ$-ML or multifidelity machine learning. We exemplify our approach for a wider class of applications in quantum chemistry including predicting excitation energies and ab initio potential energy surfaces. Here, the proposed method reduces training data consumption by up to an order of magnitude compared to standard active learning.
