The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning
Ruben Ohana, Michael McCabe, Lucas Meyer, Rudy Morel, Fruzsina J. Agocs, Miguel Beneitez, Marsha Berger, Blakesley Burkhart, Keaton Burns, Stuart B. Dalziel, Drummond B. Fielding, Daniel Fortunato, Jared A. Goldberg, Keiya Hirashima, Yan-Fei Jiang, Rich R. Kerswell, Suryanarayana Maddu, Jonah Miller, Payel Mukhopadhyay, Stefan S. Nixon, Jeff Shen, Romain Watteaux, Bruno Régaldo-Saint Blancard, François Rozet, Liam H. Parker, Miles Cranmer, Shirley Ho
TL;DR
The Well tackles the challenge of evaluating data-driven surrogates for complex physics by offering a large, diverse, and well-structured collection of 16 datasets (~15 TB) spanning multiple domains and dimensions. It provides a unified PyTorch interface and a common data specification to enable cross-domain benchmarking, complemented by baseline experiments and a comprehensive suite of metrics. The work highlights both the promise and the difficulty of generalizing surrogate models across heterogeneous physics, boundary conditions, and time horizons, and it outlines directions for incorporating physical constraints and long-horizon stability. Overall, The Well is positioned as a foundational resource to accelerate the development and rigorous assessment of next-generation physics-informed surrogates and foundation models for spatiotemporal dynamics.
Abstract
Machine learning based surrogate models offer researchers powerful tools for accelerating simulation-based workflows. However, as standard datasets in this space often cover small classes of physical behavior, it can be difficult to evaluate the efficacy of new approaches. To address this gap, we introduce the Well: a large-scale collection of datasets containing numerical simulations of a wide variety of spatiotemporal physical systems. The Well draws from domain experts and numerical software developers to provide 15TB of data across 16 datasets covering diverse domains such as biological systems, fluid dynamics, acoustic scattering, as well as magneto-hydrodynamic simulations of extra-galactic fluids or supernova explosions. These datasets can be used individually or as part of a broader benchmark suite. To facilitate usage of the Well, we provide a unified PyTorch interface for training and evaluating models. We demonstrate the function of this library by introducing example baselines that highlight the new challenges posed by the complex dynamics of the Well. The code and data is available at https://github.com/PolymathicAI/the_well.
