Learning from Mistakes: Self-Regularizing Hierarchical Representations in Point Cloud Semantic Segmentation
Elena Camuffo, Umberto Michieli, Simone Milani
TL;DR
This paper tackles fine-grained point-cloud semantic segmentation by automating a coarse-to-fine hierarchy: it derives macro class groups from a standard model's misclassifications using spectral clustering, and then regularizes the model via hierarchical prototype alignment and a fairness-based loss. The method, termed LEAK, combines micro- and macro-level prototype losses with a macro-aware fairness term, integrated into the training objective as $\mathcal{L}_{LEAK} = \mathcal{L}_{0} + \lambda_{P_m}\mathcal{L}_{P_m} + \lambda_{P_M}\mathcal{L}_{P_M} + \lambda_{\mathcal{F}}\mathcal{L}_{\mathcal{F}}$, and is shown to improve accuracy and balance across multiple architectures and datasets, including SemanticKITTI, Semantic3D, S3DIS, and even VOC2012 image segmentation. The approach is architecture-agnostic and data-agnostic, relying on automated macro-group discovery and hierarchical prototypes to guide self-regularization, which yields faster convergence and more uniform per-class performance. Overall, LEAK demonstrates robust generalization, improves state-of-the-art performance without architectural changes, and provides a practical, self-organizing mechanism for enhancing 3D point-cloud understanding in autonomous systems.
Abstract
Recent advances in autonomous robotic technologies have highlighted the growing need for precise environmental analysis. LiDAR semantic segmentation has gained attention to accomplish fine-grained scene understanding by acting directly on raw content provided by sensors. Recent solutions showed how different learning techniques can be used to improve the performance of the model, without any architectural or dataset change. Following this trend, we present a coarse-to-fine setup that LEArns from classification mistaKes (LEAK) derived from a standard model. First, classes are clustered into macro groups according to mutual prediction errors; then, the learning process is regularized by: (1) aligning class-conditional prototypical feature representation for both fine and coarse classes, (2) weighting instances with a per-class fairness index. Our LEAK approach is very general and can be seamlessly applied on top of any segmentation architecture; indeed, experimental results showed that it enables state-of-the-art performances on different architectures, datasets and tasks, while ensuring more balanced class-wise results and faster convergence.
