LMSeg: An end-to-end geometric message-passing network on barycentric dual graphs for large-scale landscape mesh segmentation

Zexian Huang; Kourosh Khoshelham; Martin Tomko

LMSeg: An end-to-end geometric message-passing network on barycentric dual graphs for large-scale landscape mesh segmentation

Zexian Huang, Kourosh Khoshelham, Martin Tomko

TL;DR

This work tackles large-scale landscape mesh semantic segmentation by introducing LMSeg, a end-to-end graph network operating on the barycentric dual graph and augmented with the Geometry Aggregation+ (GA+) module. GA+ combines graph message passing, trig positional encoding, and a learnable softmax-based neighborhood aggregation to capture both global geometric structure and high-frequency local features, while a dual hierarchical/local pooling scheme balances context and detail. Evaluations on SUM, H3D, and the Budj Bim Wall (BBW) dataset show strong performance with a compact 2.4M-parameter model, outperforming multiple baselines and demonstrating robust small-object segmentation under occlusion. The BBW dataset and LMSeg thus offer a practical, extensible framework for cultural heritage mapping, urban modeling, and environmental monitoring through scalable mesh segmentation.

Abstract

Semantic segmentation of large-scale 3D landscape meshes is critical for geospatial analysis in complex environments, yet existing approaches face persistent challenges of scalability, end-to-end trainability, and accurate segmentation of small and irregular objects. To address these issues, we introduce the BudjBim Wall (BBW) dataset, a large-scale annotated mesh dataset derived from high-resolution LiDAR scans of the UNESCO World Heritage-listed Budj Bim cultural landscape in Victoria, Australia. The BBW dataset captures historic dry-stone wall structures that are difficult to detect under vegetation occlusion, supporting research in underrepresented cultural heritage contexts. Building on this dataset, we propose LMSeg, a deep graph message-passing network for semantic segmentation of large-scale meshes. LMSeg employs a barycentric dual graph representation of mesh faces and introduces the Geometry Aggregation+ (GA+) module, a learnable softmax-based operator that adaptively combines neighborhood features and captures high-frequency geometric variations. A hierarchical-local dual pooling integrates hierarchical and local geometric aggregation to balance global context with fine-detail preservation. Experiments on three large-scale benchmarks (SUM, H3D, and BBW) show that LMSeg achieves 75.1% mIoU on SUM, 78.4% O.A. on H3D, and 62.4% mIoU on BBW, using only 2.4M lightweight parameters. In particular, LMSeg demonstrates accurate segmentation across both urban and natural scenes-capturing small-object classes such as vehicles and high vegetation in complex city environments, while also reliably detecting dry-stone walls in dense, occluded rural landscapes. Together, the BBW dataset and LMSeg provide a practical and extensible method for advancing 3D mesh segmentation in cultural heritage, environmental monitoring, and urban applications.

LMSeg: An end-to-end geometric message-passing network on barycentric dual graphs for large-scale landscape mesh segmentation

TL;DR

Abstract

Paper Structure (33 sections, 10 equations, 12 figures, 6 tables)

This paper contains 33 sections, 10 equations, 12 figures, 6 tables.

Introduction
Background
Surface Representations and Segmentation
Mesh-based Surface Segmentation
Learning Models for Urban Meshes
Methodology
Triangular Landscape Mesh and its Barycentric Dual Graph
Network Architecture
Inputs (Fig. \ref{['fig:architecture']}(a)):
Encoder (Fig. \ref{['fig:architecture']}(b)):
Decoder (Fig. \ref{['fig:architecture']}(c)):
Geometry Aggregation+ Module (GA+)
Graph message-passing convolution:
Positional embedding:
Learnable generalized softmax aggregation:
...and 18 more sections

Figures (12)

Figure 1: Overall architecture of LMSeg (Large-scale Mesh Segmentation Network). (a) The input triangular mesh is converted into a barycentric dual graph, where each node corresponds to a triangular face and edges capture 1-ring face adjacency. Node features consist of RGB values and surface normals from both faces and vertices, each processed by a Mesh Feature Encoder into a shared latent space. (b) The encoder alternates two complementary modules: HGA+ (Hierarchical Geometry Aggregation+) and LGA+ (Local Geometry Aggregation+), separated by random node sub-sampling and edge similarity pooling. HGA+ operates on a $k$-nearest neighbor graph linking downsampled nodes to their neighbors in the full-resolution mesh, capturing hierarchical, long-range, and multi-scale structural context. LGA+ operates on locally pooled geodesic neighborhoods, refining features using high-frequency, fine-scale variations to preserve geometric detail. Outputs from HGA+ and LGA+ are concatenated and refined by a residual multilayer perceptron (ResMLP). Each stage reduces node count while increasing feature dimensionality. (c) The decoder progressively upsamples features back to the original resolution using inverse-distance interpolation, skip connections, and MLP refinement, producing dense per-face semantic predictions. (d) Illustration of the hierarchical and local pooling strategy: random node sub-sampling reduces graph size efficiently, while edge similarity pooling restores meaningful local neighborhoods. HGA+ operates on hierarchical neighborhoods defined in the original geometry, and LGA+ operates on local geodesic neighborhoods, ensuring explicit multi-scale feature fusion. Here, N denotes the number of nodes in the barycentric dual graph, D the feature dimensionality, and K the number of semantic classes.
Figure 2: Different graph message-passing networks adopted in (a) typical point-based / graph-based learning approaches qi2017pointnet++wang2019dynamic and (b) the geometry aggregation+ (GA+) module.
Figure 3: (a) Non-uniformly textured mesh, (b) 3D point clouds, and (c) barycentric dual graph of the SUM dataset. The point clouds are densely sampled from the textured mesh at 30 points/m$^2$ for point-based learning models gao2021sum.
Figure 4: Spatial partitioning of the BudjBimWall dataset. Orange lines indicate annotated European historic dry-stone walls near Tae Rak (Lake Condah), Victoria, Australia. The number of data samples per area is as follows: Area 1 - 107, Area 2 - 647, Area 3 - 625, Area 4 - 716, Area 5 - 893, and Area 6 - 1008.
Figure 5: (a) Near-uniform textured triangular mesh, (b) 3D lidar point cloud (ALS), and (c) barycentric dual graph of the BBW dataset.
...and 7 more figures

LMSeg: An end-to-end geometric message-passing network on barycentric dual graphs for large-scale landscape mesh segmentation

TL;DR

Abstract

LMSeg: An end-to-end geometric message-passing network on barycentric dual graphs for large-scale landscape mesh segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (12)