Table of Contents
Fetching ...

PointRFT: Explicit Reinforcement Fine-tuning for Point Cloud Few-shot Learning

Yankai Wang, Yiding Sun, Qirui Wang, Pengbo Li, Chaoyi Lu, Dongxu Zhang

Abstract

Understanding spatial dynamics and semantics in point cloud is fundamental for comprehensive 3D comprehension. While reinforcement learning algorithms such as Group Relative Policy Optimization (GRPO) have recently achieved remarkable breakthroughs in large language models by incentivizing reasoning capabilities through strategic reward design, their potential remains largely unexplored in the 3D perception domain. This naturally raises a pivotal question: Can RL-based methods effectively empower 3D point cloud fine-tuning? In this paper, we propose PointRFT, the first reinforcement fine-tuning paradigm tailored specifically for point cloud representation learning. We select three prevalent 3D foundation models and devise specialized accuracy reward and dispersion reward functions to stabilize training and mitigate distribution shifts. Through comprehensive few-shot classification experiments comparing distinct training paradigms, we demonstrate that PointRFT consistently outperforms vanilla supervised fine-tuning (SFT) across diverse benchmarks. Furthermore, when organically integrated into a hybrid Pretraining-SFT-RFT paradigm, the representational capacity of point cloud foundation models is substantially unleashed, achieving state-of-the-art performance particularly under data-scarce scenarios.

PointRFT: Explicit Reinforcement Fine-tuning for Point Cloud Few-shot Learning

Abstract

Understanding spatial dynamics and semantics in point cloud is fundamental for comprehensive 3D comprehension. While reinforcement learning algorithms such as Group Relative Policy Optimization (GRPO) have recently achieved remarkable breakthroughs in large language models by incentivizing reasoning capabilities through strategic reward design, their potential remains largely unexplored in the 3D perception domain. This naturally raises a pivotal question: Can RL-based methods effectively empower 3D point cloud fine-tuning? In this paper, we propose PointRFT, the first reinforcement fine-tuning paradigm tailored specifically for point cloud representation learning. We select three prevalent 3D foundation models and devise specialized accuracy reward and dispersion reward functions to stabilize training and mitigate distribution shifts. Through comprehensive few-shot classification experiments comparing distinct training paradigms, we demonstrate that PointRFT consistently outperforms vanilla supervised fine-tuning (SFT) across diverse benchmarks. Furthermore, when organically integrated into a hybrid Pretraining-SFT-RFT paradigm, the representational capacity of point cloud foundation models is substantially unleashed, achieving state-of-the-art performance particularly under data-scarce scenarios.
Paper Structure (15 sections, 8 equations, 3 figures, 4 tables, 1 algorithm)

This paper contains 15 sections, 8 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: Schematic illustration of our PointRFT. (a) SFT frequently provokes catastrophic forgetting, particularly when knowledge is transferred across domains. Reinforcement fine-tuning mitigates this decay and equips the backbone with broader generalization. (b) Grounded in this behavior, we introduce three reusable paradigms: Pre-S, Pre-R, and Pre-S-R, and benchmark them on multiple few-shot classification tasks using three prevalent 3D foundation models.
  • Figure 2: Framework of PointRFT. Following the vanilla point cloud fine-tuning pipeline, the input point cloud is fed into a pre-trained base model and fine-tuned via Eq. \ref{['eq1']}. For RL fine-tuning, we treat the base model before the update as the detached reference policy and the updated base model as the actively optimized policy to stabilize training. Building on this, we propose PointRFT, which trains the foundation model to maximize reward via Alg. \ref{['alg:code']}. At each epoch, the parameters of the old model are updated and incorporated into the loss without gradient backflow. Since RFT and SFT share the same labeled data and inputs, the two paradigms do not conflict. In other words, we can maximize the downstream performance of the base model by first applying SFT and then RFT.
  • Figure 3: Comparison of computational costs in different datasets. We report the GLOPs and training time (s) for each epoch. 10-way 10-shot, 10-way 1-shot and 5-way 1-shot are selected respectively.