Billion-Scale Graph Foundation Models
Maya Bechler-Speicher, Yoel Gottlieb, Andrey Isakov, David Abensur, Ami Tavory, Daniel Haimovich, Ido Guy, Udi Weinsberg
TL;DR
This work introduces GraphBFF, the first end-to-end recipe for billion-parameter Graph Foundation Models (GFMs) that operate on billion-scale heterogeneous graphs. At its core is the GraphBFF Transformer, which fuses Type-Conditioned Attention and Type-Agnostic Attention with a sparse softmax to achieve scalable expressivity for complex graphs. The authors establish neural scaling laws for GFMs, showing predictable loss reductions when jointly scaling model size $N$ and data size $D$, and they demonstrate a 1.4B-parameter GraphBFF pretrained on one billion edges achieving strong zero-shot, few-shot, and probing performance across ten downstream tasks unseen during training. Practical contributions include novel batching strategies (KL-Batching and Round-Robin Batching) and fine-tuning methods, along with a rigorous analysis of when and how GFMs improve over task-specific baselines. The results highlight coupled data–model bottlenecks and offer a principled blueprint for deploying industrial-scale GFMs on heterogeneous graphs, while outlining key open questions and opportunities for future work.
Abstract
Graph-structured data underpins many critical applications. While foundation models have transformed language and vision via large-scale pretraining and lightweight adaptation, extending this paradigm to general, real-world graphs is challenging. In this work, we present Graph Billion- Foundation-Fusion (GraphBFF): the first end-to-end recipe for building billion-parameter Graph Foundation Models (GFMs) for arbitrary heterogeneous, billion-scale graphs. Central to the recipe is the GraphBFF Transformer, a flexible and scalable architecture designed for practical billion-scale GFMs. Using the GraphBFF, we present the first neural scaling laws for general graphs and show that loss decreases predictably as either model capacity or training data scales, depending on which factor is the bottleneck. The GraphBFF framework provides concrete methodologies for data batching, pretraining, and fine-tuning for building GFMs at scale. We demonstrate the effectiveness of the framework with an evaluation of a 1.4 billion-parameter GraphBFF Transformer pretrained on one billion samples. Across ten diverse, real-world downstream tasks on graphs unseen during training, spanning node- and link-level classification and regression, GraphBFF achieves remarkable zero-shot and probing performance, including in few-shot settings, with large margins of up to 31 PRAUC points. Finally, we discuss key challenges and open opportunities for making GFMs a practical and principled foundation for graph learning at industrial scale.
