LVIC: Multi-modality segmentation by Lifting Visual Info as Cue

Zichao Dong; Bowen Pang; Xufeng Huang; Hang Ji; Xin Zhan; Junbo Chen

LVIC: Multi-modality segmentation by Lifting Visual Info as Cue

Zichao Dong, Bowen Pang, Xufeng Huang, Hang Ji, Xin Zhan, Junbo Chen

TL;DR

This work proposes a depth aware point painting mechanism, which significantly boosts the multi-modality fusion of LiDAR semantic segmentation and takes a deeper look at the desired visual feature for LiDAR to operate semantic segmentation.

Abstract

Multi-modality fusion is proven an effective method for 3d perception for autonomous driving. However, most current multi-modality fusion pipelines for LiDAR semantic segmentation have complicated fusion mechanisms. Point painting is a quite straight forward method which directly bind LiDAR points with visual information. Unfortunately, previous point painting like methods suffer from projection error between camera and LiDAR. In our experiments, we find that this projection error is the devil in point painting. As a result of that, we propose a depth aware point painting mechanism, which significantly boosts the multi-modality fusion. Apart from that, we take a deeper look at the desired visual feature for LiDAR to operate semantic segmentation. By Lifting Visual Information as Cue, LVIC ranks 1st on nuScenes LiDAR semantic segmentation benchmark. Our experiments show the robustness and effectiveness. Codes would be make publicly available soon.

LVIC: Multi-modality segmentation by Lifting Visual Info as Cue

TL;DR

Abstract

Paper Structure (22 sections, 2 figures, 1 table)

This paper contains 22 sections, 2 figures, 1 table.

Introduction
RELATED WORK
Point cloud segmentation
Point Transformers
Depth estimation
ZoeDepth
Multi-modality semantic segmentation
Previous point painting method
CPGNet-LCF
METHOD
Overview
Visual encoder
Painting module
Fusion module
LiDAR semantic segmentation model
...and 7 more sections

Figures (2)

Figure 1: Pipeline of our visual encoder. The whole model is arbitrarily a encoder decoder architecture. Low level feature would be saved during encoder.
Figure 2: Components of fusion model. We only use simple linear layer as adaptor to fuse feature from multiple domain.

LVIC: Multi-modality segmentation by Lifting Visual Info as Cue

TL;DR

Abstract

LVIC: Multi-modality segmentation by Lifting Visual Info as Cue

Authors

TL;DR

Abstract

Table of Contents

Figures (2)