PhAST: Physics-Aware, Scalable, and Task-specific GNNs for Accelerated Catalyst Design
Alexandre Duval, Victor Schmidt, Santiago Miret, Yoshua Bengio, Alex Hernández-García, David Rolnick
TL;DR
PhAST tackles the bottleneck of applying graph neural networks to catalyst design by introducing task-specific, physics-informed enhancements that improve both accuracy and computational efficiency. It rethinks graph construction, atom embeddings, and output heads (energy and forces) to align with the IS2RE and S2EF tasks on OC20, achieving MAE reductions of up to $42\%$ and inference-time reductions of $3$–$8\times$, with CPU training speedups up to $40\times$. The method demonstrates strong, architecture-agnostic gains across multiple baselines (SchNet, DimeNet++, ForceNet, GemNet, GemNet-OC) and shows practical impact by enabling CPU-based training and deploying scalable catalyst discovery workflows. By combining physics-aware representations with efficient graph-rewiring and energy-conserving force handling, PhAST provides a scalable pathway toward rapid, data-driven electrocatalyst design and broader applicability to related molecular modeling tasks.
Abstract
Mitigating the climate crisis requires a rapid transition towards lower-carbon energy. Catalyst materials play a crucial role in the electrochemical reactions involved in numerous industrial processes key to this transition, such as renewable energy storage and electrofuel synthesis. To reduce the energy spent on such activities, we must quickly discover more efficient catalysts to drive electrochemical reactions. Machine learning (ML) holds the potential to efficiently model materials properties from large amounts of data, accelerating electrocatalyst design. The Open Catalyst Project OC20 dataset was constructed to that end. However, ML models trained on OC20 are still neither scalable nor accurate enough for practical applications. In this paper, we propose task-specific innovations applicable to most architectures, enhancing both computational efficiency and accuracy. This includes improvements in (1) the graph creation step, (2) atom representations, (3) the energy prediction head, and (4) the force prediction head. We describe these contributions, referred to as PhAST, and evaluate them thoroughly on multiple architectures. Overall, PhAST improves energy MAE by 4 to 42$\%$ while dividing compute time by 3 to 8$\times$ depending on the targeted task/model. PhAST also enables CPU training, leading to 40$\times$ speedups in highly parallelized settings. Python package: \url{https://phast.readthedocs.io}.
