Table of Contents
Fetching ...

SERA-H: Beyond Native Sentinel Spatial Limits for High-Resolution Canopy Height Mapping

Thomas Boudras, Martin Schwartz, Rasmus Fensholt, Martin Brandt, Ibrahim Fayad, Jean-Pierre Wigneron, Gabriel Belouze, Fajwel Fogel, Philippe Ciais

TL;DR

SERA-H introduces an end-to-end framework that surpasses the native 10 m Sentinel resolution by fusing a trainable super-resolution module (EDSR) with a temporal attention regression (UTAE) to predict 2.5 m canopy height maps. Trained with dense ALS supervision via the Open-Canopy dataset, it leverages Sentinel-1/2 time series to reconstruct fine forest structure, achieving MAE around 2.6 m and high Tree Cover IoU. Ablation studies show the critical roles of both the learnable upsampling and temporal modeling, while benchmarking demonstrates competitiveness with, and in some cases parity to, methods using higher-resolution or commercial imagery. The method enables freely accessible, high-frequency forest mapping, though limitations remain in resolving very fine structures and in domain transfer to data-scarce biomes. Overall, SERA-H offers a practical path to accurate, high-resolution canopy height mapping using open data and end-to-end learning.

Abstract

High-resolution mapping of canopy height is essential for forest management and biodiversity monitoring. Although recent studies have led to the advent of deep learning methods using satellite imagery to predict height maps, these approaches often face a trade-off between data accessibility and spatial resolution. To overcome these limitations, we present SERA-H, an end-to-end model combining a super-resolution module (EDSR) and temporal attention encoding (UTAE). Trained under the supervision of high-density LiDAR data (ALS), our model generates 2.5 m resolution height maps from freely available Sentinel-1 and Sentinel-2 (10 m) time series data. Evaluated on an open-source benchmark dataset in France, SERA-H, with a MAE of 2.6 m and a coefficient of determination of 0.82, not only outperforms standard Sentinel-1/2 baselines but also achieves performance comparable to or better than methods relying on commercial very high-resolution imagery (SPOT-6/7, PlanetScope, Maxar). These results demonstrate that combining high-resolution supervision with the spatiotemporal information embedded in time series enables the reconstruction of details beyond the input sensors' native resolution. SERA-H opens the possibility of freely mapping forests with high revisit frequency, achieving accuracy comparable to that of costly commercial imagery. The source code is available at https://github.com/ThomasBoudras/SERA-H#

SERA-H: Beyond Native Sentinel Spatial Limits for High-Resolution Canopy Height Mapping

TL;DR

SERA-H introduces an end-to-end framework that surpasses the native 10 m Sentinel resolution by fusing a trainable super-resolution module (EDSR) with a temporal attention regression (UTAE) to predict 2.5 m canopy height maps. Trained with dense ALS supervision via the Open-Canopy dataset, it leverages Sentinel-1/2 time series to reconstruct fine forest structure, achieving MAE around 2.6 m and high Tree Cover IoU. Ablation studies show the critical roles of both the learnable upsampling and temporal modeling, while benchmarking demonstrates competitiveness with, and in some cases parity to, methods using higher-resolution or commercial imagery. The method enables freely accessible, high-frequency forest mapping, though limitations remain in resolving very fine structures and in domain transfer to data-scarce biomes. Overall, SERA-H offers a practical path to accurate, high-resolution canopy height mapping using open data and end-to-end learning.

Abstract

High-resolution mapping of canopy height is essential for forest management and biodiversity monitoring. Although recent studies have led to the advent of deep learning methods using satellite imagery to predict height maps, these approaches often face a trade-off between data accessibility and spatial resolution. To overcome these limitations, we present SERA-H, an end-to-end model combining a super-resolution module (EDSR) and temporal attention encoding (UTAE). Trained under the supervision of high-density LiDAR data (ALS), our model generates 2.5 m resolution height maps from freely available Sentinel-1 and Sentinel-2 (10 m) time series data. Evaluated on an open-source benchmark dataset in France, SERA-H, with a MAE of 2.6 m and a coefficient of determination of 0.82, not only outperforms standard Sentinel-1/2 baselines but also achieves performance comparable to or better than methods relying on commercial very high-resolution imagery (SPOT-6/7, PlanetScope, Maxar). These results demonstrate that combining high-resolution supervision with the spatiotemporal information embedded in time series enables the reconstruction of details beyond the input sensors' native resolution. SERA-H opens the possibility of freely mapping forests with high revisit frequency, achieving accuracy comparable to that of costly commercial imagery. The source code is available at https://github.com/ThomasBoudras/SERA-H#

Paper Structure

This paper contains 35 sections, 4 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Spatial splits from the Open-Canopy dataset. The inset zooms into a representative region to show the spatial arrangement of the different training, test, and validation tiles. A 1 km buffer zone is applied around the test areas to avoid any spatial leakage.
  • Figure 2: Examples of input and reference data used in SERA-H. (a) Sentinel-2 optical image, (b) Sentinel-1 ascending radar image, and (c) Sentinel-1 descending radar image together form one of the triplet of the time series used as model input. (d) The corresponding ALS-derived canopy height map, resampled at 2.5 m, serves as reference.
  • Figure 3: Overview of the SERA-H model workflow. Each input sample consists of a temporal sequence of triplets, where each triplet is composed of one Sentinel-2 optical image and its temporally closest Sentinel-1 ascending and descending acquisitions ($S2$, $S1_{asc}$, $S1_{dsc}$). This yields an input tensor of size $T \times C \times H \times W$, where $T$ is the fixed number of triplets in the temporal sequence, $C$ the total number of spectral and radar channels per triplet, and $H$ and $W$ the spatial dimensions of each image. The sequence is first processed by the EDSR super-resolution module, which upsamples all channels by $\times 4$ to match the reference ALS resolution. The super-resolved images are then passed through the UTAE encoder-decoder, which compresses the full temporal sequence using a temporal attention mechanism and outputs a single high-resolution canopy height map ($1 \times 1 \times 4H \times 4W$). Training is performed end-to-end using a Smooth L1 loss against ALS reference data, with gradients propagating through both UTAE and EDSR.
  • Figure 4: Qualitative comparison of predicted canopy height maps. Comparison of SERA-H with ALS reference data and four state-of-the-art canopy height maps (Fogel, Liu, Pauls, and Schwartz) over four selected study sites. The first two columns show the SPOT-6/7 and Sentinel-2 source images in RGB visualization. The third column displays the reference ALS-derived canopy height map at 2.5 m resolution. The following columns illustrate the canopy height estimates generated by each model. The study sites shown include: a) a chestnut grove in Aveyron ($44.202^\circ$N, $2.262^\circ$E), b) a maritime pine forest in Dordogne ($45.000^\circ$N, $0.309^\circ$E), c) a spruce forest in the Jura ($46.344^\circ$N, $5.991^\circ$E), and d) a beech forest in the Alps ($45.015^\circ$N, $5.793^\circ$E).
  • Figure 5: Distribution of prediction error across reference canopy height bins. The plot displays the error deviation ($\text{prediction} - \text{reference}$) for six models: SERA-H, Fogel, Liu, Tolan, Pauls, and Schwartz (primary y-axis, left). The shaded grey bars in the background represent the relative proportion of samples in each height class (secondary y-axis, right).
  • ...and 3 more figures