Table of Contents
Fetching ...

Fast3Dcache: Training-free 3D Geometry Synthesis Acceleration

Mengyu Yang, Yanming Yang, Chenyi Xu, Chenxi Song, Yufan Zuo, Tong Zhao, Ruibo Li, Chi Zhang

TL;DR

Fast3Dcache introduces a training-free, geometry-aware caching framework for 3D diffusion. It leverages two core components, PCSC and SSC, to predict cache budgets and select stable tokens, guided by observed voxel stabilization and latent dynamics. The approach achieves substantial inference speedups (up to 27.12%) and FLOPs reductions (54.8%) while preserving geometry with minimal distortion in Chamfer Distance and F-Score. Experiments on TRELLIS/DSO show favorable acceleration-accuracy trade-offs and compatibility with modality-agnostic accelerators, indicating practical impact for efficient 3D geometry synthesis. The work highlights the importance of 3D-specific stability cues when reusing cached computations in diffusion-based 3D generation.

Abstract

Diffusion models have achieved impressive generative quality across modalities like 2D images, videos, and 3D shapes, but their inference remains computationally expensive due to the iterative denoising process. While recent caching-based methods effectively reuse redundant computations to speed up 2D and video generation, directly applying these techniques to 3D diffusion models can severely disrupt geometric consistency. In 3D synthesis, even minor numerical errors in cached latent features accumulate, causing structural artifacts and topological inconsistencies. To overcome this limitation, we propose Fast3Dcache, a training-free geometry-aware caching framework that accelerates 3D diffusion inference while preserving geometric fidelity. Our method introduces a Predictive Caching Scheduler Constraint (PCSC) to dynamically determine cache quotas according to voxel stabilization patterns and a Spatiotemporal Stability Criterion (SSC) to select stable features for reuse based on velocity magnitude and acceleration criterion. Comprehensive experiments show that Fast3Dcache accelerates inference significantly, achieving up to a 27.12% speed-up and a 54.8% reduction in FLOPs, with minimal degradation in geometric quality as measured by Chamfer Distance (2.48%) and F-Score (1.95%).

Fast3Dcache: Training-free 3D Geometry Synthesis Acceleration

TL;DR

Fast3Dcache introduces a training-free, geometry-aware caching framework for 3D diffusion. It leverages two core components, PCSC and SSC, to predict cache budgets and select stable tokens, guided by observed voxel stabilization and latent dynamics. The approach achieves substantial inference speedups (up to 27.12%) and FLOPs reductions (54.8%) while preserving geometry with minimal distortion in Chamfer Distance and F-Score. Experiments on TRELLIS/DSO show favorable acceleration-accuracy trade-offs and compatibility with modality-agnostic accelerators, indicating practical impact for efficient 3D geometry synthesis. The work highlights the importance of 3D-specific stability cues when reusing cached computations in diffusion-based 3D generation.

Abstract

Diffusion models have achieved impressive generative quality across modalities like 2D images, videos, and 3D shapes, but their inference remains computationally expensive due to the iterative denoising process. While recent caching-based methods effectively reuse redundant computations to speed up 2D and video generation, directly applying these techniques to 3D diffusion models can severely disrupt geometric consistency. In 3D synthesis, even minor numerical errors in cached latent features accumulate, causing structural artifacts and topological inconsistencies. To overcome this limitation, we propose Fast3Dcache, a training-free geometry-aware caching framework that accelerates 3D diffusion inference while preserving geometric fidelity. Our method introduces a Predictive Caching Scheduler Constraint (PCSC) to dynamically determine cache quotas according to voxel stabilization patterns and a Spatiotemporal Stability Criterion (SSC) to select stable features for reuse based on velocity magnitude and acceleration criterion. Comprehensive experiments show that Fast3Dcache accelerates inference significantly, achieving up to a 27.12% speed-up and a 54.8% reduction in FLOPs, with minimal degradation in geometric quality as measured by Chamfer Distance (2.48%) and F-Score (1.95%).

Paper Structure

This paper contains 39 sections, 9 equations, 13 figures, 7 tables, 1 algorithm.

Figures (13)

  • Figure 1: Observed Voxel Stabilization Trend and the PCSC Motivation. (a) The Original curve plots the empirically observed number of dynamic voxels (log-scale) per inference step, revealing a distinct three-phase pattern. (b) The PCSC curve illustrates our approach, motivated by this observation. We identify that the decay in Phase 2 can be reliably approximated by a log-linear function (red dashed line). This predictability forms the foundation for our scheduler, which we calibrate at an anchor step to forecast the stabilization budget.
  • Figure 2: Visualization of velocity field and acceleration field feat maps in $\mathbf{\mathcal{S}}_t$. The maps illustrate the temporal dynamics of (a) velocity magnitude and (b) acceleration magnitude (rate of change). These tiny dynamics mirror the three-phase stabilization pattern observed in Fig. \ref{['origin']}. The progressive decay in both velocity and acceleration magnitudes confirms their efficacy as robust criteria for identifying stable tokens suitable for caching.
  • Figure 3: Overview of the Fast3Dcache three-stage acceleration strategy.Phase 1 (Full Sampling): The process begins with full sampling to establish initial geometric stability. At the end of this phase, the PCSC is calibrated by measuring voxel change ($\sigma$) at the anchor step. Phase 2 (Dynamic Caching): In the main phase, the SSC identifies stable tokens for caching based on the dynamic budget predicted by PCSC. Only unstable tokens are processed by the FT. Phase 3 (CFG-Free Refinement): The final stage employs an aggressive fixed-ratio schedule. A high and fixed ratio $\xi$ is used to determine the proportion of tokens to cache, maximizing computational savings during these stable refinement steps.
  • Figure 4: Visualization comparison of 3D geometry synthesis. The leftmost column presents the input image. Subsequent columns display 3D meshes generated by original TRELLIS, RAS method (at varying sampling ratios). Observe that while RAS introduces noticeable geometric artifacts and surface noise, Fast3Dcache preserves structural fidelity comparable to the original TRELLIS framework, achieving acceleration without compromising quality.
  • Figure 5: More visualizations of dynamic voxels in inferences of different cases. Phase 1 is unstable which is implemented that the outline is being formed. In Phase 2, the number of dynamic voxels starts to decrease and can be predicted via PCSC. Despite the fluctuations in the downward trend during the second phase, the experimental results confirm that the log-linear approximation is acceptable. Phase 3 is also CFG Off Phase.
  • ...and 8 more figures