Table of Contents
Fetching ...

Hash3D: Training-free Acceleration for 3D Generation

Xingyi Yang, Xinchao Wang

TL;DR

Hash3D tackles the inefficiency of SDS-based diffusion-driven 3D generation by introducing a training-free, grid-based hashing mechanism to reuse intermediate diffusion features across nearby views and timesteps. By exploiting redundancy in feature maps and adapting grid sizes, Hash3D reduces the number of costly diffusion inferences, improving both speed and multi-view consistency without modifying or retraining the underlying diffusion models. Empirical results across 5 text-to-3D and 3 image-to-3D pipelines show speedups of approximately 1.3–4×, and when combined with 3D Gaussian Splatting, text-to-3D times drop to ~10 minutes and image-to-3D to ~30 seconds, with negligible to positive effects on perceptual quality. The approach is simple to implement, training-free, and broadly compatible with existing SDS-based workflows, enabling practical deployment of diffusion-based 3D generation.

Abstract

The evolution of 3D generative modeling has been notably propelled by the adoption of 2D diffusion models. Despite this progress, the cumbersome optimization process per se presents a critical hurdle to efficiency. In this paper, we introduce Hash3D, a universal acceleration for 3D generation without model training. Central to Hash3D is the insight that feature-map redundancy is prevalent in images rendered from camera positions and diffusion time-steps in close proximity. By effectively hashing and reusing these feature maps across neighboring timesteps and camera angles, Hash3D substantially prevents redundant calculations, thus accelerating the diffusion model's inference in 3D generation tasks. We achieve this through an adaptive grid-based hashing. Surprisingly, this feature-sharing mechanism not only speed up the generation but also enhances the smoothness and view consistency of the synthesized 3D objects. Our experiments covering 5 text-to-3D and 3 image-to-3D models, demonstrate Hash3D's versatility to speed up optimization, enhancing efficiency by 1.3 to 4 times. Additionally, Hash3D's integration with 3D Gaussian splatting largely speeds up 3D model creation, reducing text-to-3D processing to about 10 minutes and image-to-3D conversion to roughly 30 seconds. The project page is at https://adamdad.github.io/hash3D/.

Hash3D: Training-free Acceleration for 3D Generation

TL;DR

Hash3D tackles the inefficiency of SDS-based diffusion-driven 3D generation by introducing a training-free, grid-based hashing mechanism to reuse intermediate diffusion features across nearby views and timesteps. By exploiting redundancy in feature maps and adapting grid sizes, Hash3D reduces the number of costly diffusion inferences, improving both speed and multi-view consistency without modifying or retraining the underlying diffusion models. Empirical results across 5 text-to-3D and 3 image-to-3D pipelines show speedups of approximately 1.3–4×, and when combined with 3D Gaussian Splatting, text-to-3D times drop to ~10 minutes and image-to-3D to ~30 seconds, with negligible to positive effects on perceptual quality. The approach is simple to implement, training-free, and broadly compatible with existing SDS-based workflows, enabling practical deployment of diffusion-based 3D generation.

Abstract

The evolution of 3D generative modeling has been notably propelled by the adoption of 2D diffusion models. Despite this progress, the cumbersome optimization process per se presents a critical hurdle to efficiency. In this paper, we introduce Hash3D, a universal acceleration for 3D generation without model training. Central to Hash3D is the insight that feature-map redundancy is prevalent in images rendered from camera positions and diffusion time-steps in close proximity. By effectively hashing and reusing these feature maps across neighboring timesteps and camera angles, Hash3D substantially prevents redundant calculations, thus accelerating the diffusion model's inference in 3D generation tasks. We achieve this through an adaptive grid-based hashing. Surprisingly, this feature-sharing mechanism not only speed up the generation but also enhances the smoothness and view consistency of the synthesized 3D objects. Our experiments covering 5 text-to-3D and 3 image-to-3D models, demonstrate Hash3D's versatility to speed up optimization, enhancing efficiency by 1.3 to 4 times. Additionally, Hash3D's integration with 3D Gaussian splatting largely speeds up 3D model creation, reducing text-to-3D processing to about 10 minutes and image-to-3D conversion to roughly 30 seconds. The project page is at https://adamdad.github.io/hash3D/.
Paper Structure (14 sections, 8 equations, 9 figures, 3 tables)

This paper contains 14 sections, 8 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Feature similarity extracted from different camera poses.
  • Figure 2: By interpolating latent between generated views, we enable the synthesis of novel views with no computations.
  • Figure 3: Overall pipeline of our Hash3D. Given the sampled camera and time-step, we retrieve the intermediate diffusion feature from hash table. If no matching found, it performs a standard inference and stores the new feature in the hash table; otherwise, if a feature from a close-up view already exists, it is reused without re-calculation.
  • Figure 4: Qualitative Results using Hash3D along with Zero123 for image-to-3D generation. We mark the visual dissimilarity in yellow.
  • Figure 5: Visual comparison for text-to-3D task, when applying Hash3D to DreamFusion poole2023dreamfusion, SDS+GS and Fantasia3D Chen_2023_ICCV.
  • ...and 4 more figures