Table of Contents
Fetching ...

PointTPA: Dynamic Network Parameter Adaptation for 3D Scene Understanding

Siyuan Liu, Chaoqun Zheng, Xin Zhou, Tianrui Feng, Dingkang Liang, Xiang Bai

Abstract

Scene-level point cloud understanding remains challenging due to diverse geometries, imbalanced category distributions, and highly varied spatial layouts. Existing methods improve object-level performance but rely on static network parameters during inference, limiting their adaptability to dynamic scene data. We propose PointTPA, a Test-time Parameter Adaptation framework that generates input-aware network parameters for scene-level point clouds. PointTPA adopts a Serialization-based Neighborhood Grouping (SNG) to form locally coherent patches and a Dynamic Parameter Projector (DPP) to produce patch-wise adaptive weights, enabling the backbone to adjust its behavior according to scene-specific variations while maintaining a low parameter overhead. Integrated into the PTv3 structure, PointTPA demonstrates strong parameter efficiency by introducing two lightweight modules of less than 2% of the backbone's parameters. Despite this minimal parameter overhead, PointTPA achieves 78.4% mIoU on ScanNet validation, surpassing existing parameter-efficient fine-tuning (PEFT) methods across multiple benchmarks, highlighting the efficacy of our test-time dynamic network parameter adaptation mechanism in enhancing 3D scene understanding. The code is available at https://github.com/H-EmbodVis/PointTPA.

PointTPA: Dynamic Network Parameter Adaptation for 3D Scene Understanding

Abstract

Scene-level point cloud understanding remains challenging due to diverse geometries, imbalanced category distributions, and highly varied spatial layouts. Existing methods improve object-level performance but rely on static network parameters during inference, limiting their adaptability to dynamic scene data. We propose PointTPA, a Test-time Parameter Adaptation framework that generates input-aware network parameters for scene-level point clouds. PointTPA adopts a Serialization-based Neighborhood Grouping (SNG) to form locally coherent patches and a Dynamic Parameter Projector (DPP) to produce patch-wise adaptive weights, enabling the backbone to adjust its behavior according to scene-specific variations while maintaining a low parameter overhead. Integrated into the PTv3 structure, PointTPA demonstrates strong parameter efficiency by introducing two lightweight modules of less than 2% of the backbone's parameters. Despite this minimal parameter overhead, PointTPA achieves 78.4% mIoU on ScanNet validation, surpassing existing parameter-efficient fine-tuning (PEFT) methods across multiple benchmarks, highlighting the efficacy of our test-time dynamic network parameter adaptation mechanism in enhancing 3D scene understanding. The code is available at https://github.com/H-EmbodVis/PointTPA.

Paper Structure

This paper contains 26 sections, 6 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: (a) Scene-level point clouds have more points and highly imbalanced category distributions compared to object-level point clouds. (b) Different from the existing PEFT methods, our PointTPA provides dynamic projection weights during inference and performs better in 3D semantic segmentation.
  • Figure 1: A comparison of FFT, IDPT zha2023instance, DAPT zhou2024dynamic, and PointTPA on segmentation performance, evaluated on ScanNet dai2017scannet.
  • Figure 2: Overview of PointTPA. It consists of a Serialization-based Neighborhood Grouping (SNG) and a Dynamic Parameter Projector (DPP). During the training process, we freeze the backbone parameter and only fine-tune our DPP and static Adapter modules. The DPP can produce dynamic weights during inference. The same color denotes tokens with spatially close positions.
  • Figure 2: More visualizations of the semantic segmentation results of our PointTPA on four large-scale scene datasets. (a) ScanNet dai2017scannet, (b) ScanNet200 rozenberszki2022language, (c) ScanNet++ yeshwanth2023scannet++, (d) S3DIS armeni20163d with 3 views.
  • Figure 3: Illustration of our mixed-insertion strategy. PointTPA is applied to the last block of each stage, while static adapters are adopted in the remaining blocks.
  • ...and 3 more figures