Table of Contents
Fetching ...

AMVNet: Assertion-based Multi-View Fusion Network for LiDAR Semantic Segmentation

Venice Erin Liong, Thi Ngoc Tho Nguyen, Sergi Widjaja, Dhananjai Sharma, Zhuang Jie Chong

TL;DR

AMVNet addresses LiDAR semantic segmentation by fusing multiple projection-based networks through a late fusion framework. It introduces assertion-guided point sampling on score disagreements, feeding selected point features to a lightweight point head for refinement. The modular design enables two independent projection-based networks with modest overhead, making it attractive for resource-constrained robotics. The approach advances state-of-the-art performance on SemanticKITTI and nuScenes and outperforms simple score-level fusion baselines, highlighting the practical benefits of assertion-guided, multi-view fusion for LiDAR understanding.

Abstract

In this paper, we present an Assertion-based Multi-View Fusion network (AMVNet) for LiDAR semantic segmentation which aggregates the semantic features of individual projection-based networks using late fusion. Given class scores from different projection-based networks, we perform assertion-guided point sampling on score disagreements and pass a set of point-level features for each sampled point to a simple point head which refines the predictions. This modular-and-hierarchical late fusion approach provides the flexibility of having two independent networks with a minor overhead from a light-weight network. Such approaches are desirable for robotic systems, e.g. autonomous vehicles, for which the computational and memory resources are often limited. Extensive experiments show that AMVNet achieves state-of-the-art results in both the SemanticKITTI and nuScenes benchmark datasets and that our approach outperforms the baseline method of combining the class scores of the projection-based networks.

AMVNet: Assertion-based Multi-View Fusion Network for LiDAR Semantic Segmentation

TL;DR

AMVNet addresses LiDAR semantic segmentation by fusing multiple projection-based networks through a late fusion framework. It introduces assertion-guided point sampling on score disagreements, feeding selected point features to a lightweight point head for refinement. The modular design enables two independent projection-based networks with modest overhead, making it attractive for resource-constrained robotics. The approach advances state-of-the-art performance on SemanticKITTI and nuScenes and outperforms simple score-level fusion baselines, highlighting the practical benefits of assertion-guided, multi-view fusion for LiDAR understanding.

Abstract

In this paper, we present an Assertion-based Multi-View Fusion network (AMVNet) for LiDAR semantic segmentation which aggregates the semantic features of individual projection-based networks using late fusion. Given class scores from different projection-based networks, we perform assertion-guided point sampling on score disagreements and pass a set of point-level features for each sampled point to a simple point head which refines the predictions. This modular-and-hierarchical late fusion approach provides the flexibility of having two independent networks with a minor overhead from a light-weight network. Such approaches are desirable for robotic systems, e.g. autonomous vehicles, for which the computational and memory resources are often limited. Extensive experiments show that AMVNet achieves state-of-the-art results in both the SemanticKITTI and nuScenes benchmark datasets and that our approach outperforms the baseline method of combining the class scores of the projection-based networks.

Paper Structure

This paper contains 4 sections, 1 figure.

Figures (1)

  • Figure 1: Example of caption. It is set in Roman so that mathematics (always set in Roman: $B \sin A = A \sin B$) may be included without an ugly clash.