Table of Contents
Fetching ...

MS-DGCNN++: Multi-Scale Dynamic Graph Convolution with Scale-Dependent Normalization for Robust LiDAR Tree Species Classification

Said Ohamouddou, Hanaa El Afia, Mohamed Hamza Boulaich, Abdellatif El Afia, Raddouane Chiheb

Abstract

Graph-based deep learning on LiDAR point clouds encodes geometry through edge features, yet standard implementations use the same encoding at every scale. In tree species classification, where point density varies by orders of magnitude between trunk and canopy, this is particularly limiting. We prove it is suboptimal: normalized directional features have mean squared error decaying as $\mathcal{O}(1/s^2)$ with inter-point distance~$s$, while raw displacement error is constant, implying each encoding suits a different signal-to-noise ratio (SNR) regime. We propose MS-DGCNN++, a multi-scale dynamic graph convolutional network with \emph{scale-dependent edge encoding}: raw vectors at the local scale (low SNR) and hybrid raw-plus-normalized vectors at the intermediate scale (high SNR). Five ablations validate this design: encoding ablation confirms $+4$--$6\%$ overall accuracy (OA) gain; density dropout shows the flattest degradation under canopy thinning; a noise sweep locates the theoretical crossover near $\text{SNR}_2 \approx 1.22$; max-pooling provenance reveals far neighbors win $85\%$ of competitions under raw encoding, a bias eliminated by normalization; and isotropy analysis shows normalization nearly doubles effective rank. On STPCTLS (seven species, terrestrial laser scanning), MS-DGCNN++ achieves the highest OA ($92.91\%$) among 56 models, surpassing self-supervised methods with $7$--$24\times$ more parameters using only $1.81$M parameters. On HeliALS (nine species, airborne laser scanning, geometry-only), it achieves $73.66\%$ OA with the best balanced accuracy ($50.28\%$), matching FGI-PointTransformer which uses $4\times$ more points. Robustness analysis across five perturbation types reveals complementary variant strengths for deployment in heterogeneous forest environments. Code: https://github.com/said-ohamouddou/MS-DGCNN2.

MS-DGCNN++: Multi-Scale Dynamic Graph Convolution with Scale-Dependent Normalization for Robust LiDAR Tree Species Classification

Abstract

Graph-based deep learning on LiDAR point clouds encodes geometry through edge features, yet standard implementations use the same encoding at every scale. In tree species classification, where point density varies by orders of magnitude between trunk and canopy, this is particularly limiting. We prove it is suboptimal: normalized directional features have mean squared error decaying as with inter-point distance~, while raw displacement error is constant, implying each encoding suits a different signal-to-noise ratio (SNR) regime. We propose MS-DGCNN++, a multi-scale dynamic graph convolutional network with \emph{scale-dependent edge encoding}: raw vectors at the local scale (low SNR) and hybrid raw-plus-normalized vectors at the intermediate scale (high SNR). Five ablations validate this design: encoding ablation confirms -- overall accuracy (OA) gain; density dropout shows the flattest degradation under canopy thinning; a noise sweep locates the theoretical crossover near ; max-pooling provenance reveals far neighbors win of competitions under raw encoding, a bias eliminated by normalization; and isotropy analysis shows normalization nearly doubles effective rank. On STPCTLS (seven species, terrestrial laser scanning), MS-DGCNN++ achieves the highest OA () among 56 models, surpassing self-supervised methods with -- more parameters using only M parameters. On HeliALS (nine species, airborne laser scanning, geometry-only), it achieves OA with the best balanced accuracy (), matching FGI-PointTransformer which uses more points. Robustness analysis across five perturbation types reveals complementary variant strengths for deployment in heterogeneous forest environments. Code: https://github.com/said-ohamouddou/MS-DGCNN2.

Paper Structure

This paper contains 62 sections, 4 theorems, 11 equations, 8 figures, 9 tables.

Key Result

Proposition 4.3

$\mathbb{E}\bigl[\lVert\tilde{\bm{r}} - \bm{r}\rVert^2\bigr] = 2D\,\sigma^2$, independent of $s = \lVert\bm{r}\rVert$.

Figures (8)

  • Figure 1: Complete MS-DGCNN++ architecture. Stage1 (a) extracts and fuses multi-scale features on raw coordinates using scale-dependent encoding. Stage2 (b) refines the features through dynamic graph convolution in the learned feature space. Stage 3 (c) produces class predictions.
  • Figure 2: Data transformations applied during ablation and robustness evaluation. Depending on the experiment, perturbations are applied at training time, test time, or both.
  • Figure 3: Accuracy vs. upper-canopy retention rate. Solid lines: OA for each variant (shaded bands: $\pm 1$ std over five folds). Dashed grey: mean $k_2$-neighbor distance (right axis), confirming that thinning inflates raw edge magnitudes. The default asymmetric variant (c) achieves the highest OA at intermediate retention rates ($r{=}50$--$10\%$).
  • Figure 4: OA vs. empirical $\text{SNR}_2$ (log scale). The dashed red line marks the theoretical crossover at $\text{SNR}_2 \approx 1.22$ (Corollary \ref{['cor:threshold']}). At a high SNR, all variants converge; below the crossover, normalized variants maintain a clear advantage. Shaded bands: $\pm 1$ std over 5 folds.
  • Figure 5: Max-pooling provenance at four retention rates. Each bar shows the fraction of channel-wise max-pool winners that are near (blue) vs. far (red) neighbors. The raw-input model exhibits a strong far-neighbor bias ($>77\%$), whereas the normalized-input model remains balanced at $50\%$ across all conditions. Dashed line: chance level.
  • ...and 3 more figures

Theorems & Definitions (13)

  • Remark 4.1: Unit convention
  • Definition 4.2: Edge encodings
  • Proposition 4.3: MSE of $\phi_{\mathrm{raw}}$
  • proof
  • Theorem 4.4: MSE of $\phi_{\mathrm{dir}}$
  • proof
  • Corollary 4.5
  • proof
  • Remark 4.6
  • Proposition 4.7: Complementary information content
  • ...and 3 more