Towards a future space-based, highly scalable AI infrastructure system design
Blaise Agüera y Arcas, Travis Beals, Maria Biggs, Jessica V. Bloom, Thomas Fischbacher, Konstantin Gromov, Urs Köster, Rishiraj Pravahan, James Manyika
TL;DR
This work articulates a vision for space-based, highly scalable AI infrastructure built from solar-powered satellite clusters hosting TPU accelerators and tightly coupled inter-satellite networks. It provides a feasibility assessment across key components—inter-satellite optical links, formation-flight dynamics, radiation tolerance of TPUs, and launch-economics—using an illustrative 81-satellite, 1 km-radius constellation in dawn–dusk LEO. The results indicate plausible high-bandwidth ISLs with short-range DWDM, manageable orbital dynamics with modest delta-v, and TPU survivability within mission lifetimes, while outlining a credible pathway to affordable launch costs by mid-2030s through reuse and scale. The discussion highlights practical milestones and remaining challenges, including thermal management, reliability, and ground communications, and points to future directions such as integrated SoC designs and ML-driven formation control to unlock full potential of orbiting AI compute.
Abstract
If AI is a foundational general-purpose technology, we should anticipate that demand for AI compute -- and energy -- will continue to grow. The Sun is by far the largest energy source in our solar system, and thus it warrants consideration how future AI infrastructure could most efficiently tap into that power. This work explores a scalable compute system for machine learning in space, using fleets of satellites equipped with solar arrays, inter-satellite links using free-space optics, and Google tensor processing unit (TPU) accelerator chips. To facilitate high-bandwidth, low-latency inter-satellite communication, the satellites would be flown in close proximity. We illustrate the basic approach to formation flight via a 81-satellite cluster of 1 km radius, and describe an approach for using high-precision ML-based models to control large-scale constellations. Trillium TPUs are radiation tested. They survive a total ionizing dose equivalent to a 5 year mission life without permanent failures, and are characterized for bit-flip errors. Launch costs are a critical part of overall system cost; a learning curve analysis suggests launch to low-Earth orbit (LEO) may reach $\lesssim$\$200/kg by the mid-2030s.
