Table of Contents
Fetching ...

Towards a future space-based, highly scalable AI infrastructure system design

Blaise Agüera y Arcas, Travis Beals, Maria Biggs, Jessica V. Bloom, Thomas Fischbacher, Konstantin Gromov, Urs Köster, Rishiraj Pravahan, James Manyika

TL;DR

This work articulates a vision for space-based, highly scalable AI infrastructure built from solar-powered satellite clusters hosting TPU accelerators and tightly coupled inter-satellite networks. It provides a feasibility assessment across key components—inter-satellite optical links, formation-flight dynamics, radiation tolerance of TPUs, and launch-economics—using an illustrative 81-satellite, 1 km-radius constellation in dawn–dusk LEO. The results indicate plausible high-bandwidth ISLs with short-range DWDM, manageable orbital dynamics with modest delta-v, and TPU survivability within mission lifetimes, while outlining a credible pathway to affordable launch costs by mid-2030s through reuse and scale. The discussion highlights practical milestones and remaining challenges, including thermal management, reliability, and ground communications, and points to future directions such as integrated SoC designs and ML-driven formation control to unlock full potential of orbiting AI compute.

Abstract

If AI is a foundational general-purpose technology, we should anticipate that demand for AI compute -- and energy -- will continue to grow. The Sun is by far the largest energy source in our solar system, and thus it warrants consideration how future AI infrastructure could most efficiently tap into that power. This work explores a scalable compute system for machine learning in space, using fleets of satellites equipped with solar arrays, inter-satellite links using free-space optics, and Google tensor processing unit (TPU) accelerator chips. To facilitate high-bandwidth, low-latency inter-satellite communication, the satellites would be flown in close proximity. We illustrate the basic approach to formation flight via a 81-satellite cluster of 1 km radius, and describe an approach for using high-precision ML-based models to control large-scale constellations. Trillium TPUs are radiation tested. They survive a total ionizing dose equivalent to a 5 year mission life without permanent failures, and are characterized for bit-flip errors. Launch costs are a critical part of overall system cost; a learning curve analysis suggests launch to low-Earth orbit (LEO) may reach $\lesssim$\$200/kg by the mid-2030s.

Towards a future space-based, highly scalable AI infrastructure system design

TL;DR

This work articulates a vision for space-based, highly scalable AI infrastructure built from solar-powered satellite clusters hosting TPU accelerators and tightly coupled inter-satellite networks. It provides a feasibility assessment across key components—inter-satellite optical links, formation-flight dynamics, radiation tolerance of TPUs, and launch-economics—using an illustrative 81-satellite, 1 km-radius constellation in dawn–dusk LEO. The results indicate plausible high-bandwidth ISLs with short-range DWDM, manageable orbital dynamics with modest delta-v, and TPU survivability within mission lifetimes, while outlining a credible pathway to affordable launch costs by mid-2030s through reuse and scale. The discussion highlights practical milestones and remaining challenges, including thermal management, reliability, and ground communications, and points to future directions such as integrated SoC designs and ML-driven formation control to unlock full potential of orbiting AI compute.

Abstract

If AI is a foundational general-purpose technology, we should anticipate that demand for AI compute -- and energy -- will continue to grow. The Sun is by far the largest energy source in our solar system, and thus it warrants consideration how future AI infrastructure could most efficiently tap into that power. This work explores a scalable compute system for machine learning in space, using fleets of satellites equipped with solar arrays, inter-satellite links using free-space optics, and Google tensor processing unit (TPU) accelerator chips. To facilitate high-bandwidth, low-latency inter-satellite communication, the satellites would be flown in close proximity. We illustrate the basic approach to formation flight via a 81-satellite cluster of 1 km radius, and describe an approach for using high-precision ML-based models to control large-scale constellations. Trillium TPUs are radiation tested. They survive a total ionizing dose equivalent to a 5 year mission life without permanent failures, and are characterized for bit-flip errors. Launch costs are a critical part of overall system cost; a learning curve analysis suggests launch to low-Earth orbit (LEO) may reach \$200/kg by the mid-2030s.

Paper Structure

This paper contains 14 sections, 1 equation, 4 figures, 1 table.

Figures (4)

  • Figure 1: Existing OISL device specifications vs. proposed design. Lines illustrate the $1/d^2$ relationship between distance and achievable bandwidth for three modulation schemes with different photons-per-bit (PPB) requirements. Commercial systems operate at long ranges, while our proposed system targets much shorter ranges to achieve higher data rates. 24-way dense wavelength-division multiplexing (DWDM) can be achieved up to about 300km distance with a 10cm aperture size. Fitting $2 \times 2$, $4 \times 4$, and $8 \times 8$ spatially multiplexed beams into the same total aperture requires distances of 2.5km, 0.63km and 0.15km (limited by crosstalk rather than power). Modulation schemes shown: Quadrature-Amplitude modulation with 16 symbols (PM-16QAM), on-off keying (OOK), and the Shannon-Hartley limit of channel capacity.
  • Figure 2: Evolution of a free-fall (i.e. “no thrust”) constellation subject to Earth’s gravitational attraction plus J2-term (due to Earth’s oblateness) over the course of one orbit, shown at time intervals of $1/12$ of a full orbit in a non-rotating coordinate system. Positions are relative to the central reference satellite S0 (red). Horizontal axis is aligned with the negative in-track direction of S0 at t=0, vertical direction correspondingly is “towards zenith at t=0.” Short arrows indicate the “towards center of Earth” direction. Magenta: nearest neighbors (8-neighborhood) of the central satellite S0. Dark blue: the “maximally-distant in in-flight direction at $t=0$” satellite S1. Dark blue dashed: S1’s cluster-center-relative positions over the course of one orbit. All distances in meters.
  • Figure 3: Evolution of the distance between central "reference" satellite S0 and its (direct and diagonal) nearest neighbors over the course of one orbit under the combined effect of Newtonian Gravity and Earth's J2-term.
  • Figure 4: SpaceX payload mass launched by lowest achieved price, inflation-adjusted, since the first successful Falcon 1 launch, for progressive rocket categories. Note major price discontinuities at the introduction of Falcon 9 and Falcon Heavy NextSpaceflightLaunchesMueller2020ElectricPropulsionJonathanSpaceReport.