Table of Contents
Fetching ...

MoST: Efficient Monarch Sparse Tuning for 3D Representation Learning

Xu Han, Yuan Tang, Jinfeng Xu, Xianzhi Li

TL;DR

MoST presents a reparameterization-based 3D PEFT framework that uses Point Monarch to replace dense update matrices with sparse, locally-aware transformations, preserving inference efficiency while boosting representation learning on irregular 3D point clouds. By introducing K-Rectify based local fusion and a parameter-free multi-layer feature fusion strategy, MoST achieves state-of-the-art results across object- and scene-level tasks with only a small fraction of trainable parameters. It showcases strong generalization across diverse backbones, compatibility with matrix decompositions for further compression, and substantial performance gains over full fine-tuning in many benchmarks, signaling a practical path for efficient large-scale 3D model tuning. The work highlights the importance of local geometric feature capture in 3D PEFT and provides a flexible, hardware-friendly approach that can adapt to various architectures and tasks.

Abstract

We introduce Monarch Sparse Tuning (MoST), the first reparameterization-based parameter-efficient fine-tuning (PEFT) method tailored for 3D representation learning. Unlike existing adapter-based and prompt-tuning 3D PEFT methods, MoST introduces no additional inference overhead and is compatible with many 3D representation learning backbones. At its core, we present a new family of structured matrices for 3D point clouds, Point Monarch, which can capture local geometric features of irregular points while offering high expressiveness. MoST reparameterizes the dense update weight matrices as our sparse Point Monarch matrices, significantly reducing parameters while retaining strong performance. Experiments on various backbones show that MoST is simple, effective, and highly generalizable. It captures local features in point clouds, achieving state-of-the-art results on multiple benchmarks, e.g., 97.5% acc. on ScanObjectNN (PB_50_RS) and 96.2% on ModelNet40 classification, while it can also combine with other matrix decompositions (e.g., Low-rank, Kronecker) to further reduce parameters.

MoST: Efficient Monarch Sparse Tuning for 3D Representation Learning

TL;DR

MoST presents a reparameterization-based 3D PEFT framework that uses Point Monarch to replace dense update matrices with sparse, locally-aware transformations, preserving inference efficiency while boosting representation learning on irregular 3D point clouds. By introducing K-Rectify based local fusion and a parameter-free multi-layer feature fusion strategy, MoST achieves state-of-the-art results across object- and scene-level tasks with only a small fraction of trainable parameters. It showcases strong generalization across diverse backbones, compatibility with matrix decompositions for further compression, and substantial performance gains over full fine-tuning in many benchmarks, signaling a practical path for efficient large-scale 3D model tuning. The work highlights the importance of local geometric feature capture in 3D PEFT and provides a flexible, hardware-friendly approach that can adapt to various architectures and tasks.

Abstract

We introduce Monarch Sparse Tuning (MoST), the first reparameterization-based parameter-efficient fine-tuning (PEFT) method tailored for 3D representation learning. Unlike existing adapter-based and prompt-tuning 3D PEFT methods, MoST introduces no additional inference overhead and is compatible with many 3D representation learning backbones. At its core, we present a new family of structured matrices for 3D point clouds, Point Monarch, which can capture local geometric features of irregular points while offering high expressiveness. MoST reparameterizes the dense update weight matrices as our sparse Point Monarch matrices, significantly reducing parameters while retaining strong performance. Experiments on various backbones show that MoST is simple, effective, and highly generalizable. It captures local features in point clouds, achieving state-of-the-art results on multiple benchmarks, e.g., 97.5% acc. on ScanObjectNN (PB_50_RS) and 96.2% on ModelNet40 classification, while it can also combine with other matrix decompositions (e.g., Low-rank, Kronecker) to further reduce parameters.

Paper Structure

This paper contains 18 sections, 9 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Existing 3D parameter-efficient fine-tuning (PEFT) methods rely on additional adapters or prompts, which, while using point cloud priors, introduce inference overhead and lack generalization. Reparameterization-based PEFT methods like LoRALoRA, though free of the above issues, overlook point cloud characteristics. MoST combines the best of both worlds by reparameterizing dense update weight matrices with tailored sparse Point Monarch matrices, preserving local geometry, avoiding inference overhead, and remaining generalizable.
  • Figure 2: The average $l_2$-distance of features between KNN centers and neighbors after applying different structured matrices (left). We observe this local feature distance correlates with classification acc. on PB_50_RS ScanObjectNN19 (right). LoRA and Monarch Monarch lead to higher distances, Point Monarch exhibits the lowest one by smoothing local geometric features, achieving 97.5% acc. with PointGPT PointGPT.
  • Figure 3: Illustration of Monarch Sparse Tuning. During training, MoST reparameterizes dense update weight matrices using our sparse and expressive Point Monarch matrices $\mathbf{\textcolor{mambacolor}{K} L R \textcolor{mambacolor}{K}}$, which capture local geometric features of points through simple linear transformations.
  • Figure 4: Illustration of Patch Embedding and $K$-Rectify. (a) Our Point Monarch performs channel-wise permutation and token-wise local rectification, and its block-wise structure aligns with patch-based point cloud representation learning. (b) $K$-Rectify groups local features based on xyz coordinates and interpolates a new center feature, then facilitating the center feature rectification.
  • Figure 5: Variants with Low-rank or Kronecker decomposition.
  • ...and 1 more figures