Table of Contents
Fetching ...

EarthPT: a time series foundation model for Earth Observation

Michael J. Smith, Luke Fleming, James E. Geach

TL;DR

EarthPT introduces a 700M-parameter autoregressive transformer trained on roughly $14\,\text{B}$ tokens of multispectral EO time-series to function as an Earth Observation foundation model. Trained on ClearSky-generated Sentinel-2–like data covering a $100\times100$ km UK region across 2015–2023, it forecasts pixel-level surface reflectances and key indices months ahead, achieving a median $L_1$ error of about $0.05$ for NDVI on a $-1$ to $1$ range, outperforming a phase-folded baseline. The model also learns semantically meaningful embeddings that show structure aligned with remote-sensing indices, offering potential for dynamic land-use classification. The authors argue that the abundance of EO data supports scaling EarthPT to much larger parameter counts and token budgets, following neural scaling laws, to realize broader, high-impact EO capabilities.

Abstract

We introduce EarthPT -- an Earth Observation (EO) pretrained transformer. EarthPT is a 700 million parameter decoding transformer foundation model trained in an autoregressive self-supervised manner and developed specifically with EO use-cases in mind. We demonstrate that EarthPT is an effective forecaster that can accurately predict future pixel-level surface reflectances across the 400-2300 nm range well into the future. For example, forecasts of the evolution of the Normalised Difference Vegetation Index (NDVI) have a typical error of approximately 0.05 (over a natural range of -1 -> 1) at the pixel level over a five month test set horizon, out-performing simple phase-folded models based on historical averaging. We also demonstrate that embeddings learnt by EarthPT hold semantically meaningful information and could be exploited for downstream tasks such as highly granular, dynamic land use classification. Excitingly, we note that the abundance of EO data provides us with -- in theory -- quadrillions of training tokens. Therefore, if we assume that EarthPT follows neural scaling laws akin to those derived for Large Language Models (LLMs), there is currently no data-imposed limit to scaling EarthPT and other similar `Large Observation Models.'

EarthPT: a time series foundation model for Earth Observation

TL;DR

EarthPT introduces a 700M-parameter autoregressive transformer trained on roughly tokens of multispectral EO time-series to function as an Earth Observation foundation model. Trained on ClearSky-generated Sentinel-2–like data covering a km UK region across 2015–2023, it forecasts pixel-level surface reflectances and key indices months ahead, achieving a median error of about for NDVI on a to range, outperforming a phase-folded baseline. The model also learns semantically meaningful embeddings that show structure aligned with remote-sensing indices, offering potential for dynamic land-use classification. The authors argue that the abundance of EO data supports scaling EarthPT to much larger parameter counts and token budgets, following neural scaling laws, to realize broader, high-impact EO capabilities.

Abstract

We introduce EarthPT -- an Earth Observation (EO) pretrained transformer. EarthPT is a 700 million parameter decoding transformer foundation model trained in an autoregressive self-supervised manner and developed specifically with EO use-cases in mind. We demonstrate that EarthPT is an effective forecaster that can accurately predict future pixel-level surface reflectances across the 400-2300 nm range well into the future. For example, forecasts of the evolution of the Normalised Difference Vegetation Index (NDVI) have a typical error of approximately 0.05 (over a natural range of -1 -> 1) at the pixel level over a five month test set horizon, out-performing simple phase-folded models based on historical averaging. We also demonstrate that embeddings learnt by EarthPT hold semantically meaningful information and could be exploited for downstream tasks such as highly granular, dynamic land use classification. Excitingly, we note that the abundance of EO data provides us with -- in theory -- quadrillions of training tokens. Therefore, if we assume that EarthPT follows neural scaling laws akin to those derived for Large Language Models (LLMs), there is currently no data-imposed limit to scaling EarthPT and other similar `Large Observation Models.'
Paper Structure (10 sections, 4 figures, 1 table)

This paper contains 10 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Predictions of some common remote sensing indicators for a randomly chosen pixel within the UK National Grid TL tile. We condition EarthPT on ClearSky time series from 1st January 2015 to 1st January 2023, with outputs after this divergence date constituting a long-term forecast to be compared to the unseen observations.
  • Figure 2: Median L1 error and interquartile ranges of NDVI predictions for 1M pixels in the TL63 tile. EarthPT long-term forecasts out-perform a simple phase-folded model based on historical averages out to a horizon of five months.
  • Figure 3: EarthPT embeddings for the two million pixel time series located on the TL63 and TL64 BNG tiles. We colour each scatter plot with a different set of emergent remote sensing index values. 'RGB' is the colour of a pixel in that part of the embedding space at the height of the summer of 2022. 'Mean' is the mean of a given index across the 2022 calendar year, and 'std' is the standard deviation of the index across the year. 'NDVI peak' is the time of the year corresponding to maximum NDVI; darker values are in the winter, and lighter values are in the summer. Note the coherent structure in the projected embedding space.
  • Figure 4: Loss curves for our various EarthPT training runs.