Investigating Vision-Language Model for Point Cloud-based Vehicle Classification

Yiqiao Li; Jie Wei; Camille Kamga

Investigating Vision-Language Model for Point Cloud-based Vehicle Classification

Yiqiao Li, Jie Wei, Camille Kamga

TL;DR

The paper tackles heavy-duty truck classification using roadside LiDAR while reducing annotation burden. It introduces a pipeline that converts sparse LiDAR point clouds into dense 2D renderings via point cloud registration and smoothing, then applies in-context few-shot prompting with a vision-language model to classify among 12 vehicle types. Key contributions include leveraging real-world LiDAR data, a point cloud image preprocessing workflow, and a few-shot prompting strategy that achieves competitive F1 scores with as few as 3 demonstrations. This approach has practical implications for safer cooperative driving by enabling scalable truck classification with reduced labeling costs.

Abstract

Heavy-duty trucks pose significant safety challenges due to their large size and limited maneuverability compared to passenger vehicles. A deeper understanding of truck characteristics is essential for enhancing the safety perspective of cooperative autonomous driving. Traditional LiDAR-based truck classification methods rely on extensive manual annotations, which makes them labor-intensive and costly. The rapid advancement of large language models (LLMs) trained on massive datasets presents an opportunity to leverage their few-shot learning capabilities for truck classification. However, existing vision-language models (VLMs) are primarily trained on image datasets, which makes it challenging to directly process point cloud data. This study introduces a novel framework that integrates roadside LiDAR point cloud data with VLMs to facilitate efficient and accurate truck classification, which supports cooperative and safe driving environments. This study introduces three key innovations: (1) leveraging real-world LiDAR datasets for model development, (2) designing a preprocessing pipeline to adapt point cloud data for VLM input, including point cloud registration for dense 3D rendering and mathematical morphological techniques to enhance feature representation, and (3) utilizing in-context learning with few-shot prompting to enable vehicle classification with minimally labeled training data. Experimental results demonstrate encouraging performance of this method and present its potential to reduce annotation efforts while improving classification accuracy.

Investigating Vision-Language Model for Point Cloud-based Vehicle Classification

TL;DR

Abstract

Investigating Vision-Language Model for Point Cloud-based Vehicle Classification

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)