Point Transformer V3 Extreme: 1st Place Solution for 2024 Waymo Open Dataset Challenge in Semantic Segmentation
Xiaoyang Wu, Xiang Xu, Lingdong Kong, Liang Pan, Ziwei Liu, Tong He, Wanli Ouyang, Hengshuang Zhao
TL;DR
The paper tackles semantic segmentation on dense 3D LiDAR data from the Waymo Open Dataset and introduces Point Transformer V3 Extreme, an enhanced PTv3 variant. It combines multi-frame training, a no-clipping-point policy, space-filling curve-based tokenization, and a simple model ensemble to boost performance, achieving validation mIoU improvements from 72.1% to 74.8% and test mIoU improvements from 70.7% to 72.8%. Key innovations include structured tokenization of unstructured point clouds, efficient attention mechanisms, and a practical ensemble strategy. The approach secures first place on the semantic segmentation leaderboard and demonstrates the effectiveness of transformer-based 3D perception for autonomous driving, offering a scalable path for future research and deployment.
Abstract
In this technical report, we detail our first-place solution for the 2024 Waymo Open Dataset Challenge's semantic segmentation track. We significantly enhanced the performance of Point Transformer V3 on the Waymo benchmark by implementing cutting-edge, plug-and-play training and inference technologies. Notably, our advanced version, Point Transformer V3 Extreme, leverages multi-frame training and a no-clipping-point policy, achieving substantial gains over the original PTv3 performance. Additionally, employing a straightforward model ensemble strategy further boosted our results. This approach secured us the top position on the Waymo Open Dataset semantic segmentation leaderboard, markedly outperforming other entries.
