Table of Contents
Fetching ...

SUM Parts: Benchmarking Part-Level Semantic Segmentation of Urban Meshes

Weixiao Gao, Liangliang Nan, Hugo Ledoux

TL;DR

SUM Parts addresses the lack of part-level semantic annotations for urban textured meshes by introducing a large-scale dataset and an efficient annotation tool. It combines face- and texture-based labeling, ground-truth for $21$ classes across $2.5 \,\text{km}^2$, and provides benchmarks for 3D semantic segmentation and interactive annotation. The work demonstrates that mesh-texture representations paired with template-driven annotation improve labeling efficiency and segmentation performance, with PointVector achieving state-of-the-art results on face and pixel tracks. This dataset enables finer urban modeling for smart cities, BIM workflows, and digital twins, complementing existing LiDAR/mesh datasets with richer, part-level semantics.

Abstract

Semantic segmentation in urban scene analysis has mainly focused on images or point clouds, while textured meshes - offering richer spatial representation - remain underexplored. This paper introduces SUM Parts, the first large-scale dataset for urban textured meshes with part-level semantic labels, covering about 2.5 km2 with 21 classes. The dataset was created using our own annotation tool, which supports both face- and texture-based annotations with efficient interactive selection. We also provide a comprehensive evaluation of 3D semantic segmentation and interactive annotation methods on this dataset. Our project page is available at https://tudelft3d.github.io/SUMParts/.

SUM Parts: Benchmarking Part-Level Semantic Segmentation of Urban Meshes

TL;DR

SUM Parts addresses the lack of part-level semantic annotations for urban textured meshes by introducing a large-scale dataset and an efficient annotation tool. It combines face- and texture-based labeling, ground-truth for classes across , and provides benchmarks for 3D semantic segmentation and interactive annotation. The work demonstrates that mesh-texture representations paired with template-driven annotation improve labeling efficiency and segmentation performance, with PointVector achieving state-of-the-art results on face and pixel tracks. This dataset enables finer urban modeling for smart cities, BIM workflows, and digital twins, complementing existing LiDAR/mesh datasets with richer, part-level semantics.

Abstract

Semantic segmentation in urban scene analysis has mainly focused on images or point clouds, while textured meshes - offering richer spatial representation - remain underexplored. This paper introduces SUM Parts, the first large-scale dataset for urban textured meshes with part-level semantic labels, covering about 2.5 km2 with 21 classes. The dataset was created using our own annotation tool, which supports both face- and texture-based annotations with efficient interactive selection. We also provide a comprehensive evaluation of 3D semantic segmentation and interactive annotation methods on this dataset. Our project page is available at https://tudelft3d.github.io/SUMParts/.

Paper Structure

This paper contains 32 sections, 15 equations, 25 figures, 8 tables.

Figures (25)

  • Figure 1: SUM Parts provides part-level semantic segmentation of urban textured meshes, covering $2.5 \, \text{km}^2$ with 21 classes. From left to right: textured mesh, face-based annotations, and texture-based annotations. See \ref{['tab:categories']} for class definitions.
  • Figure 2: Mesh textures and wireframes (black).
  • Figure 3: Interactive 3D selection. The user performs a lasso (green) or stroke selection (yellow) (b), which generates candidate faces (red) (c). Binary labeling is then applied to these candidate faces to extract protrusions (red) (d).
  • Figure 4: 3D template matching. When the user selects a planar segment by clicking on it (a), the matched segments are automatically identified (b). A similar matching process also applies to protrusions via a user-drawn stroke, as shown in (c) and (d).
  • Figure 5: Interactive 2D selection. The user selects a texture segment (green) (a). Superpixels are generated (blue), and the user clicks on the region of interest (green star) (b). This triggers local expansion, yielding a coarse segmentation (red) (c), followed by fine segmentation for the final selection (red) (d).
  • ...and 20 more figures