Table of Contents
Fetching ...

Depth Any Canopy: Leveraging Depth Foundation Models for Canopy Height Estimation

Daniele Rege Cambrin, Isaac Corley, Paolo Garza

TL;DR

This work tackles global canopy height estimation from single-view imagery, addressing the scarcity and cost of LiDAR-derived ground truth. It fine-tunes a monocular depth foundation model, Depth Anything v2, to create Depth Any Canopy (DAC), delivering competitive or superior canopy height maps with far lower compute and carbon footprint than state-of-the-art baselines. The authors demonstrate DAC's effectiveness on EarthView and HRCHM datasets, highlight substantial efficiency gains, and provide qualitative evidence of robust canopy delineation in complex scenes. The study suggests that depth-estimation foundation models pretrained on natural imagery can be effectively repurposed for remote sensing tasks, enabling scalable, low-cost global canopy height mapping.

Abstract

Estimating global tree canopy height is crucial for forest conservation and climate change applications. However, capturing high-resolution ground truth canopy height using LiDAR is expensive and not available globally. An efficient alternative is to train a canopy height estimator to operate on single-view remotely sensed imagery. The primary obstacle to this approach is that these methods require significant training data to generalize well globally and across uncommon edge cases. Recent monocular depth estimation foundation models have show strong zero-shot performance even for complex scenes. In this paper we leverage the representations learned by these models to transfer to the remote sensing domain for measuring canopy height. Our findings suggest that our proposed Depth Any Canopy, the result of fine-tuning the Depth Anything v2 model for canopy height estimation, provides a performant and efficient solution, surpassing the current state-of-the-art with superior or comparable performance using only a fraction of the computational resources and parameters. Furthermore, our approach requires less than \$1.30 in compute and results in an estimated carbon footprint of 0.14 kgCO2. Code, experimental results, and model checkpoints are openly available at https://github.com/DarthReca/depth-any-canopy.

Depth Any Canopy: Leveraging Depth Foundation Models for Canopy Height Estimation

TL;DR

This work tackles global canopy height estimation from single-view imagery, addressing the scarcity and cost of LiDAR-derived ground truth. It fine-tunes a monocular depth foundation model, Depth Anything v2, to create Depth Any Canopy (DAC), delivering competitive or superior canopy height maps with far lower compute and carbon footprint than state-of-the-art baselines. The authors demonstrate DAC's effectiveness on EarthView and HRCHM datasets, highlight substantial efficiency gains, and provide qualitative evidence of robust canopy delineation in complex scenes. The study suggests that depth-estimation foundation models pretrained on natural imagery can be effectively repurposed for remote sensing tasks, enabling scalable, low-cost global canopy height mapping.

Abstract

Estimating global tree canopy height is crucial for forest conservation and climate change applications. However, capturing high-resolution ground truth canopy height using LiDAR is expensive and not available globally. An efficient alternative is to train a canopy height estimator to operate on single-view remotely sensed imagery. The primary obstacle to this approach is that these methods require significant training data to generalize well globally and across uncommon edge cases. Recent monocular depth estimation foundation models have show strong zero-shot performance even for complex scenes. In this paper we leverage the representations learned by these models to transfer to the remote sensing domain for measuring canopy height. Our findings suggest that our proposed Depth Any Canopy, the result of fine-tuning the Depth Anything v2 model for canopy height estimation, provides a performant and efficient solution, surpassing the current state-of-the-art with superior or comparable performance using only a fraction of the computational resources and parameters. Furthermore, our approach requires less than \$1.30 in compute and results in an estimated carbon footprint of 0.14 kgCO2. Code, experimental results, and model checkpoints are openly available at https://github.com/DarthReca/depth-any-canopy.
Paper Structure (23 sections, 8 figures, 1 table)

This paper contains 23 sections, 8 figures, 1 table.

Figures (8)

  • Figure 1: From Depth Anything depth_anything_v2 to Depth Any Canopy. Depth Anything is a monocular depth estimation foundation model trained on natural imagery. We fine-tune and adapt Depth Anything v2 for the task of estimating tree canopy height in remote sensing imagery, resulting in Depth Any Canopy (DAC).
  • Figure 2: National Ecological Observatory Network sites across the US. Aquatic sites collect information about aquatic ecosystems, while terrestrial ones collect data about terrestrial ecosystems. Core sites provide long-term support, while Gradient sites are temporary sites to study the ecological response to specific changes.
  • Figure 3: ALS-derived tree canopy height acquisition process. LiDAR sensors attached to a fixed-wing aircraft, emits a laser pulse that reflects many times until reaching the ground. The canopy height is computed using the time-delta between the first and the last return of the pulse, the pulse obtained when reaching the ground level.
  • Figure 4: Example quality scores by Q-Align qalign on NEON RGB images from the EarthView dataset. On the left, a noisy sample scored 1.10; in the middle, a medium-quality sample scored 2.53; and on the right, a better-quality sample scored 3.71. This solution permits the detection and filtering of low-quality samples affected by warping and motion blurs.
  • Figure 5: Depth Anything v2 training procedure. Synthetic images and relative labels are used to train a large teacher model. It is employed to annotate real images to create pseudo-labels. The real images and relative pseudo-labels are used to train a small student model.
  • ...and 3 more figures