AERO: An autonomous platform for continuous research
Valérie Hayot-Sasson, Abby Stevens, Nicholson Collier, Sudershan Sridhar, Kyle Conroy, J. Gregory Pauloski, Yadu Babuji, Maxime Gonthier, Nathaniel Hudson, Dante D. Sanchez-Gallegos, Ian Foster, Jonathan Ozik, Kyle Chard
TL;DR
The paper addresses gaps in data infrastructure for automated, continuous, cross-domain scientific research during health crises. It introduces AERO, an event-based automation platform that leverages a trigger-action paradigm, a distributed bring-your-own-resource compute/storage model, and Globus services (Compute, Flows, and Auth) to automate ingestion, validation, analysis, and sharing with provenance. Evaluation on a synthetic workload and two $R(t)$ estimation use cases demonstrates near-linear scalability and, on average, faster performance for Globus Flows compared to GitHub Actions, while highlighting variability from external services. The work provides open-source tooling and data references to enable reproducible, FAIR-compliant automated research across institutions, with practical impact for public health surveillance and beyond.
Abstract
The COVID-19 pandemic highlighted the need for new data infrastructure, as epidemiologists and public health workers raced to harness rapidly evolving data, analytics, and infrastructure in support of cross-sector investigations. To meet this need, we developed AERO, an automated research and data sharing platform for continuous, distributed, and multi-disciplinary collaboration. In this paper, we describe the AERO design and how it supports the automatic ingestion, validation, and transformation of monitored data into a form suitable for analysis; the automated execution of analyses on this data; and the sharing of data among different entities. We also describe how our AERO implementation leverages capabilities provided by the Globus platform and GitHub for automation, distributed execution, data sharing, and authentication. We present results obtained with an instance of AERO running two public health surveillance applications and demonstrate benchmarking results with a synthetic application, all of which are publicly available for testing.
