Table of Contents
Fetching ...

Benchmarking Deep Learning Architectures for Urban Vegetation Point Cloud Semantic Segmentation from MLS

Aditya Aditya, Bharat Lohani, Jagannath Aryal, Stephan Winter

TL;DR

This work benchmarks seven point-based deep learning models (PointCNN, KPConv omni-supervised, RandLANet, SCFNet, PointNeXt, SPoTr, PointMetaBase) for vegetation-point segmentation across three MLS datasets (Chandigarh, Toronto3D, Kerala) using a 10-fold cross-validation setup. It reveals that no single model consistently wins across all datasets; PointMetaBase excels on Chandigarh and Kerala, while KPConv dominates Toronto3D, and PointCNN leads on Kerala. Scene complexity and per-point feature richness significantly influence performance, with Kerala posing the strongest challenge due to diverse vegetation and scene clutter. The authors derive actionable insights and propose architectural building blocks for a dedicated vegetation-point segmentation model, aiming to advance practical urban vegetation mapping and carbon stock estimation.

Abstract

Vegetation is crucial for sustainable and resilient cities providing various ecosystem services and well-being of humans. However, vegetation is under critical stress with rapid urbanization and expanding infrastructure footprints. Consequently, mapping of this vegetation is essential in the urban environment. Recently, deep learning for point cloud semantic segmentation has shown significant progress. Advanced models attempt to obtain state-of-the-art performance on benchmark datasets, comprising multiple classes and representing real world scenarios. However, class specific segmentation with respect to vegetation points has not been explored. Therefore, selection of a deep learning model for vegetation points segmentation is ambiguous. To address this problem, we provide a comprehensive assessment of point-based deep learning models for semantic segmentation of vegetation class. We have selected seven representative point-based models, namely PointCNN, KPConv (omni-supervised), RandLANet, SCFNet, PointNeXt, SPoTr and PointMetaBase. These models are investigated on three different datasets, specifically Chandigarh, Toronto3D and Kerala, which are characterized by diverse nature of vegetation and varying scene complexity combined with changing per-point features and class-wise composition. PointMetaBase and KPConv (omni-supervised) achieve the highest mIoU on the Chandigarh (95.24%) and Toronto3D datasets (91.26%), respectively while PointCNN provides the highest mIoU on the Kerala dataset (85.68%). The paper develops a deeper insight, hitherto not reported, into the working of these models for vegetation segmentation and outlines the ingredients that should be included in a model specifically for vegetation segmentation. This paper is a step towards the development of a novel architecture for vegetation points segmentation.

Benchmarking Deep Learning Architectures for Urban Vegetation Point Cloud Semantic Segmentation from MLS

TL;DR

This work benchmarks seven point-based deep learning models (PointCNN, KPConv omni-supervised, RandLANet, SCFNet, PointNeXt, SPoTr, PointMetaBase) for vegetation-point segmentation across three MLS datasets (Chandigarh, Toronto3D, Kerala) using a 10-fold cross-validation setup. It reveals that no single model consistently wins across all datasets; PointMetaBase excels on Chandigarh and Kerala, while KPConv dominates Toronto3D, and PointCNN leads on Kerala. Scene complexity and per-point feature richness significantly influence performance, with Kerala posing the strongest challenge due to diverse vegetation and scene clutter. The authors derive actionable insights and propose architectural building blocks for a dedicated vegetation-point segmentation model, aiming to advance practical urban vegetation mapping and carbon stock estimation.

Abstract

Vegetation is crucial for sustainable and resilient cities providing various ecosystem services and well-being of humans. However, vegetation is under critical stress with rapid urbanization and expanding infrastructure footprints. Consequently, mapping of this vegetation is essential in the urban environment. Recently, deep learning for point cloud semantic segmentation has shown significant progress. Advanced models attempt to obtain state-of-the-art performance on benchmark datasets, comprising multiple classes and representing real world scenarios. However, class specific segmentation with respect to vegetation points has not been explored. Therefore, selection of a deep learning model for vegetation points segmentation is ambiguous. To address this problem, we provide a comprehensive assessment of point-based deep learning models for semantic segmentation of vegetation class. We have selected seven representative point-based models, namely PointCNN, KPConv (omni-supervised), RandLANet, SCFNet, PointNeXt, SPoTr and PointMetaBase. These models are investigated on three different datasets, specifically Chandigarh, Toronto3D and Kerala, which are characterized by diverse nature of vegetation and varying scene complexity combined with changing per-point features and class-wise composition. PointMetaBase and KPConv (omni-supervised) achieve the highest mIoU on the Chandigarh (95.24%) and Toronto3D datasets (91.26%), respectively while PointCNN provides the highest mIoU on the Kerala dataset (85.68%). The paper develops a deeper insight, hitherto not reported, into the working of these models for vegetation segmentation and outlines the ingredients that should be included in a model specifically for vegetation segmentation. This paper is a step towards the development of a novel architecture for vegetation points segmentation.
Paper Structure (37 sections, 1 equation, 8 figures, 2 tables)

This paper contains 37 sections, 1 equation, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Portion of datasets: (a) Chandigarh (b) Toronto3D (c) Kerala. Red points represent vegetation while white points represent non-vegetation points.
  • Figure 2: Points distribution across tiles. Chandigarh, Toronto3D and Kerala datasets have been fragmented into 10 tiles for training and testing purposes.
  • Figure 3: Three step procedure for experimentation. First step is data labeling in two classes. Second step is tiles generation. Final step is model training and testing. i can assume integral values between 1 to 10 subject to ten-fold cross validation mode of experiments. Varying j is indicative of the seven employed DL models.
  • Figure 4: Performance of models with respect to mIoU. Highest values are observed on the Chandigarh dataset followed by the Toronto3D and Kerala datasets in order.
  • Figure 5: Performance of models with respect to overall accuracy (OA). Highest values are observed on the Toronto3D dataset followed by the Chandigarh and Kerala datasets in order.
  • ...and 3 more figures