Table of Contents
Fetching ...

T3DNet: Compressing Point Cloud Models for Lightweight 3D Recognition

Zhiyuan Yang, Yunjiao Zhou, Lihua Xie, Jianfei Yang

TL;DR

T3DNet addresses the challenge of deploying 3D point-cloud models on memory- and latency-constrained devices by predefining a tiny model and boosting its capacity through structured augmentation and subsequent knowledge distillation from a powerful teacher. The approach uses a two-stage pipeline: first, tiny network augmentation (NetAug) creates auxiliary supervision by embedding the tiny model within augmented networks; then, a staged knowledge distillation from the original model further improves accuracy. Empirical results show dramatic compression (up to $58\\times$ fewer parameters and $54\\times$ fewer FLOPs) with minimal accuracy loss on ModelNet40, plus competitive gains on ShapeNet and ScanObjectNN, and strong generalization to PointMLP and PTv2 architectures (e.g., up to $98\\%$ parameter reduction on PTv2 with only about $1.74\\%$ mIoU loss). The findings demonstrate that network augmentation makes distillation more effective for point clouds and highlight T3DNet as a practical tool for deployable 3D perception on edge/IoT devices.

Abstract

3D point cloud has been widely used in many mobile application scenarios, including autonomous driving and 3D sensing on mobile devices. However, existing 3D point cloud models tend to be large and cumbersome, making them hard to deploy on edged devices due to their high memory requirements and non-real-time latency. There has been a lack of research on how to compress 3D point cloud models into lightweight models. In this paper, we propose a method called T3DNet (Tiny 3D Network with augmEntation and disTillation) to address this issue. We find that the tiny model after network augmentation is much easier for a teacher to distill. Instead of gradually reducing the parameters through techniques such as pruning or quantization, we pre-define a tiny model and improve its performance through auxiliary supervision from augmented networks and the original model. We evaluate our method on several public datasets, including ModelNet40, ShapeNet, and ScanObjectNN. Our method can achieve high compression rates without significant accuracy sacrifice, achieving state-of-the-art performances on three datasets against existing methods. Amazingly, our T3DNet is 58 times smaller and 54 times faster than the original model yet with only 1.4% accuracy descent on the ModelNet40 dataset.

T3DNet: Compressing Point Cloud Models for Lightweight 3D Recognition

TL;DR

T3DNet addresses the challenge of deploying 3D point-cloud models on memory- and latency-constrained devices by predefining a tiny model and boosting its capacity through structured augmentation and subsequent knowledge distillation from a powerful teacher. The approach uses a two-stage pipeline: first, tiny network augmentation (NetAug) creates auxiliary supervision by embedding the tiny model within augmented networks; then, a staged knowledge distillation from the original model further improves accuracy. Empirical results show dramatic compression (up to fewer parameters and fewer FLOPs) with minimal accuracy loss on ModelNet40, plus competitive gains on ShapeNet and ScanObjectNN, and strong generalization to PointMLP and PTv2 architectures (e.g., up to parameter reduction on PTv2 with only about mIoU loss). The findings demonstrate that network augmentation makes distillation more effective for point clouds and highlight T3DNet as a practical tool for deployable 3D perception on edge/IoT devices.

Abstract

3D point cloud has been widely used in many mobile application scenarios, including autonomous driving and 3D sensing on mobile devices. However, existing 3D point cloud models tend to be large and cumbersome, making them hard to deploy on edged devices due to their high memory requirements and non-real-time latency. There has been a lack of research on how to compress 3D point cloud models into lightweight models. In this paper, we propose a method called T3DNet (Tiny 3D Network with augmEntation and disTillation) to address this issue. We find that the tiny model after network augmentation is much easier for a teacher to distill. Instead of gradually reducing the parameters through techniques such as pruning or quantization, we pre-define a tiny model and improve its performance through auxiliary supervision from augmented networks and the original model. We evaluate our method on several public datasets, including ModelNet40, ShapeNet, and ScanObjectNN. Our method can achieve high compression rates without significant accuracy sacrifice, achieving state-of-the-art performances on three datasets against existing methods. Amazingly, our T3DNet is 58 times smaller and 54 times faster than the original model yet with only 1.4% accuracy descent on the ModelNet40 dataset.
Paper Structure (20 sections, 7 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 20 sections, 7 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: T3DNet helps to compress a large 3D point cloud model into a small one with little accuracy drops. The tiny model is more suitable for AIoT (Artificial intelligence of things) devices such as micro-controllers in autonomous cars and other 3D perception sensors.
  • Figure 2: Two-stage T3DNet framework. We initialize a tiny model out of the large model. In stage 1, we introduce additional supervision from tiny network augmentation to enhance the tiny model's representative ability. In every epoch, the augmented model is obtained by independently expanding the size of each layer following the expanding ratio strategy. Blue neurons denote the randomly augmented neurons in every epoch. In stage 2, we distill the tiny model by the original network. The augmentation and distillation are conducted in different stages.
  • Figure 3: Visualization of samples in ModelNet40 wu20153d, ScanObjectNN uy2019revisiting, and ShapeNet chang2015shapenet. ModelNet40 is noise-free and clean, while ScanObjectNN is a noisy real-world dataset. ShapeNet is a 3D part segmentation task, whose objects are divided into different parts with the corresponding labels.
  • Figure 4: Validation set accuracy curve of different distillation methods attempts in the training procedure. The horizontal and vertical axes are epochs and the validation accuracy, respectively.