Multi-Fidelity Active Learning with GFlowNets
Alex Hernandez-Garcia, Nikita Saxena, Moksh Jain, Cheng-Hao Liu, Yoshua Bengio
TL;DR
This work tackles the challenge of efficiently discovering diverse, high-scoring candidates in combinatorially large, high-dimensional spaces under limited budgets. It introduces MF-GFN, a framework that extends GFlowNets with multi-fidelity active learning by learning a joint policy over inputs and fidelity levels and optimising a cost-aware acquisition, MF-MES, via a multi-fidelity GP with deep kernel learning. Through extensive experiments on DNA aptamers, antimicrobial peptides, and small molecules, MF-GFN achieves substantial cost reductions over single-fidelity baselines while retaining diversity and discovering multiple high-scoring modes, outperforming RL-based and some BO baselines. The approach demonstrates practical potential to accelerate scientific discovery and materials/drug design by efficiently allocating computational or experimental resources across fidelity levels. The work also discusses limitations, such as simulated costs, and outlines future work including more complex design spaces and multi-objective extensions.
Abstract
In the last decades, the capacity to generate large amounts of data in science and engineering applications has been growing steadily. Meanwhile, machine learning has progressed to become a suitable tool to process and utilise the available data. Nonetheless, many relevant scientific and engineering problems present challenges where current machine learning methods cannot yet efficiently leverage the available data and resources. For example, in scientific discovery, we are often faced with the problem of exploring very large, structured and high-dimensional spaces. Moreover, the high fidelity, black-box objective function is often very expensive to evaluate. Progress in machine learning methods that can efficiently tackle such challenges would help accelerate currently crucial areas such as drug and materials discovery. In this paper, we propose a multi-fidelity active learning algorithm with GFlowNets as a sampler, to efficiently discover diverse, high-scoring candidates where multiple approximations of the black-box function are available at lower fidelity and cost. Our evaluation on molecular discovery tasks shows that multi-fidelity active learning with GFlowNets can discover high-scoring candidates at a fraction of the budget of its single-fidelity counterpart while maintaining diversity, unlike RL-based alternatives. These results open new avenues for multi-fidelity active learning to accelerate scientific discovery and engineering design.
