Table of Contents
Fetching ...

MFM-point: Multi-scale Flow Matching for Point Cloud Generation

Petr Molodyk, Jaemoo Choi, David W. Romero, Ming-Yu Liu, Yongxin Chen

TL;DR

MFM-point introduces a scalable multi-scale flow-matching framework for point cloud generation that advances point-based methods to high-resolution and multi-category tasks. By designing geometry-preserving downsampling and distribution-aligned upsampling, and by enforcing cross-stage alignment with a principled FM objective, the method achieves best-in-class performance among point-based models and competitive results with representation-based approaches. The two-stage coarse-to-fine generation leverages independent flow models at each resolution, enabling efficient training and fast inference. Extensive ablations demonstrate the importance of the geometry-aware operators and the transition-time boundaries, while conditional generation experiments highlight the framework’s flexibility. Overall, MFM-point significantly improves scalability and fidelity for point cloud synthesis with practical relevance to 3D modeling and robotics.

Abstract

In recent years, point cloud generation has gained significant attention in 3D generative modeling. Among existing approaches, point-based methods directly generate point clouds without relying on other representations such as latent features, meshes, or voxels. These methods offer low training cost and algorithmic simplicity, but often underperform compared to representation-based approaches. In this paper, we propose MFM-Point, a multi-scale Flow Matching framework for point cloud generation that substantially improves the scalability and performance of point-based methods while preserving their simplicity and efficiency. Our multi-scale generation algorithm adopts a coarse-to-fine generation paradigm, enhancing generation quality and scalability without incurring additional training or inference overhead. A key challenge in developing such a multi-scale framework lies in preserving the geometric structure of unordered point clouds while ensuring smooth and consistent distributional transitions across resolutions. To address this, we introduce a structured downsampling and upsampling strategy that preserves geometry and maintains alignment between coarse and fine resolutions. Our experimental results demonstrate that MFM-Point achieves best-in-class performance among point-based methods and challenges the best representation-based methods. In particular, MFM-point demonstrates strong results in multi-category and high-resolution generation tasks.

MFM-point: Multi-scale Flow Matching for Point Cloud Generation

TL;DR

MFM-point introduces a scalable multi-scale flow-matching framework for point cloud generation that advances point-based methods to high-resolution and multi-category tasks. By designing geometry-preserving downsampling and distribution-aligned upsampling, and by enforcing cross-stage alignment with a principled FM objective, the method achieves best-in-class performance among point-based models and competitive results with representation-based approaches. The two-stage coarse-to-fine generation leverages independent flow models at each resolution, enabling efficient training and fast inference. Extensive ablations demonstrate the importance of the geometry-aware operators and the transition-time boundaries, while conditional generation experiments highlight the framework’s flexibility. Overall, MFM-point significantly improves scalability and fidelity for point cloud synthesis with practical relevance to 3D modeling and robotics.

Abstract

In recent years, point cloud generation has gained significant attention in 3D generative modeling. Among existing approaches, point-based methods directly generate point clouds without relying on other representations such as latent features, meshes, or voxels. These methods offer low training cost and algorithmic simplicity, but often underperform compared to representation-based approaches. In this paper, we propose MFM-Point, a multi-scale Flow Matching framework for point cloud generation that substantially improves the scalability and performance of point-based methods while preserving their simplicity and efficiency. Our multi-scale generation algorithm adopts a coarse-to-fine generation paradigm, enhancing generation quality and scalability without incurring additional training or inference overhead. A key challenge in developing such a multi-scale framework lies in preserving the geometric structure of unordered point clouds while ensuring smooth and consistent distributional transitions across resolutions. To address this, we introduce a structured downsampling and upsampling strategy that preserves geometry and maintains alignment between coarse and fine resolutions. Our experimental results demonstrate that MFM-Point achieves best-in-class performance among point-based methods and challenges the best representation-based methods. In particular, MFM-point demonstrates strong results in multi-category and high-resolution generation tasks.

Paper Structure

This paper contains 46 sections, 2 theorems, 21 equations, 21 figures, 6 tables, 3 algorithms.

Key Result

Theorem 3.1

Suppose $X^k_s$ and $X^{k+1}_e$ are defined as in eq:interpolate, and $0 \leq e_{k+1} \leq s_k \leq 1$ for all $k\in \{0,1,\dots, K-1\}$. Then, we have where the covariance matrix $\Sigma':= \mathrm{Diag}\!(\{\Sigma'_{D\times D}\}_{m=1}^M)$ is positive semi-definite block-diagonal with each block matrix $\Sigma'_{D\times D}$ given by Here, $\mathbbm{1}_D$ denotes the $D$-dimensional all-ones vec

Figures (21)

  • Figure 1: Overview of the proposed multi-scale point cloud generation framework. The framework consists of two processes: (1) Training (shown in brown) and (2) Inference (shown in green). During training, we perform downsampling to obtain the coarse representation $X^{k+1}_e = \text{Down}(X^k_e)$, and train a flow model $v^k_\theta$ to transport samples from the distribution of $X^k_e$ at each stage $k$. During inference, we first sample points through the learned flow $v^{k+1}_\theta$ from $X^{k+1}_s$ to $X^{k+1}_e$, and then upsample the result to reconstruct $X^k_s$, the input for the next finer stage. We employ an equal-size K-means clustering strategy as the downsampling operator $\text{Down}(\cdot)$, followed by a carefully designed upsampling procedure that ensures distributional alignment between the upsampled samples and the corresponding fine-scale distribution.
  • Figure 1: Unconditional generation results on high-resolution point clouds (8192 and 15K points), evaluated using one-nearest neighbor accuracy (1-NNA) under Chamfer Distance (CD $\downarrow$) and Earth Mover’s Distance (EMD $\downarrow$). Results are reported for ShapeNet (Airplane, Car, Chair) and Objaverse-XL (Furniture). Bold numbers indicate the best overall performance, while underlined numbers denote results within a 1.0 margin of the best.
  • Figure 2: Generated samples from our 2-stage model on the multi-category setting.
  • Figure 3: Generated samples from our 2-stage model on the single-category setting.
  • Figure 4: Comparison between PSF, Random Pair, and Ours on Airplane and Chair dataset.
  • ...and 16 more figures

Theorems & Definitions (4)

  • Theorem 3.1
  • Remark 3.2
  • Corollary 3.3
  • proof