Table of Contents
Fetching ...

Progressive Per-Branch Depth Optimization for DEFOM-Stereo and SAM3 Joint Analysis in UAV Forestry Applications

Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green

TL;DR

A progressive pipeline integrating DEFOM-Stereo foundation-model disparity estimation, SAM3 instance segmentation, and multi-stage depth optimization to deliver robust per-branch point clouds suitable for autonomous pruning tool positioning is introduced.

Abstract

Accurate per-branch 3D reconstruction is a prerequisite for autonomous UAV-based tree pruning; however, dense disparity maps from modern stereo matchers often remain too noisy for individual branch analysis in complex forest canopies. This paper introduces a progressive pipeline integrating DEFOM-Stereo foundation-model disparity estimation, SAM3 instance segmentation, and multi-stage depth optimization to deliver robust per-branch point clouds. Starting from a naive baseline, we systematically identify and resolve three error families through successive refinements. Mask boundary contamination is first addressed through morphological erosion and subsequently refined via a skeleton-preserving variant to safeguard thin-branch topology. Segmentation inaccuracy is then mitigated using LAB-space Mahalanobis color validation coupled with cross-branch overlap arbitration. Finally, depth noise - the most persistent error source - is initially reduced by outlier removal and median filtering, before being superseded by a robust five-stage scheme comprising MAD global detection, spatial density consensus, local MAD filtering, RGB-guided filtering, and adaptive bilateral filtering. Evaluated on 1920x1080 stereo imagery of Radiata pine (Pinus radiata) acquired with a ZED Mini camera (63 mm baseline) from a UAV in Canterbury, New Zealand, the proposed pipeline reduces the average per-branch depth standard deviation by 82% while retaining edge fidelity. The result is geometrically coherent 3D point clouds suitable for autonomous pruning tool positioning. All code and processed data are publicly released to facilitate further UAV forestry research.

Progressive Per-Branch Depth Optimization for DEFOM-Stereo and SAM3 Joint Analysis in UAV Forestry Applications

TL;DR

A progressive pipeline integrating DEFOM-Stereo foundation-model disparity estimation, SAM3 instance segmentation, and multi-stage depth optimization to deliver robust per-branch point clouds suitable for autonomous pruning tool positioning is introduced.

Abstract

Accurate per-branch 3D reconstruction is a prerequisite for autonomous UAV-based tree pruning; however, dense disparity maps from modern stereo matchers often remain too noisy for individual branch analysis in complex forest canopies. This paper introduces a progressive pipeline integrating DEFOM-Stereo foundation-model disparity estimation, SAM3 instance segmentation, and multi-stage depth optimization to deliver robust per-branch point clouds. Starting from a naive baseline, we systematically identify and resolve three error families through successive refinements. Mask boundary contamination is first addressed through morphological erosion and subsequently refined via a skeleton-preserving variant to safeguard thin-branch topology. Segmentation inaccuracy is then mitigated using LAB-space Mahalanobis color validation coupled with cross-branch overlap arbitration. Finally, depth noise - the most persistent error source - is initially reduced by outlier removal and median filtering, before being superseded by a robust five-stage scheme comprising MAD global detection, spatial density consensus, local MAD filtering, RGB-guided filtering, and adaptive bilateral filtering. Evaluated on 1920x1080 stereo imagery of Radiata pine (Pinus radiata) acquired with a ZED Mini camera (63 mm baseline) from a UAV in Canterbury, New Zealand, the proposed pipeline reduces the average per-branch depth standard deviation by 82% while retaining edge fidelity. The result is geometrically coherent 3D point clouds suitable for autonomous pruning tool positioning. All code and processed data are publicly released to facilitate further UAV forestry research.
Paper Structure (46 sections, 23 equations, 9 figures, 3 tables)

This paper contains 46 sections, 23 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Overview of the progressive pipeline. Foundation modules (top, blue) feed into SAM3; the arrow descends to Version 1, then proceeds right-to-left through six iterative refinements (orange) addressing mask contamination (V2--V3), segmentation accuracy (V4), and depth noise (V5--V6), producing per-branch 3D point clouds.
  • Figure 2: Representative input data: (a) left image of Radiata pine canopy captured by ZED Mini at 1--2 m, and (b) dense disparity map produced by DEFOM-Stereo (ViT-L DINOv2, 32 iterations). Warm colors indicate closer objects.
  • Figure 3: Version 6 five-stage depth optimization results. Top row: before optimization (V4 masks with raw DEFOM depth). Bottom row: after the five-stage MAD-based pipeline. Average $\sigma_{Z}$ reduced from 174.6 mm to 31.5 mm (82.0% reduction).
  • Figure 4: Average per-branch depth standard deviation across pipeline versions. V2 achieves low $\sigma_{Z}$ but loses 85.6% of mask pixels. V6 achieves the lowest $\sigma_{Z}$ (31.5 mm) while preserving all branches.
  • Figure 5: Mask refinement progression from V1 to V4. V2 erosion disconnects thin branches (Branch 3: 2 535$\to$1 px). V3 skeleton preservation recovers connectivity (1 281 px). V4 color validation produces clean masks.
  • ...and 4 more figures