Table of Contents
Fetching ...

Textured 3D Regenerative Morphing with 3D Diffusion Prior

Songlin Yang, Yushi Lan, Honghua Chen, Xingang Pan

TL;DR

The paper tackles textured 3D morphing across diverse object categories without relying on explicit point-to-point correspondences. It introduces a regenerative morphing pipeline built on a generic 3D diffusion prior (Gaussian Anything), interpolating source/target information at three levels (initial noises, LoRA model parameters, and CLIP-conditioned features) and refining results through Attention Fusion, Token Reordering, and Low-Frequency Enhancement. The authors demonstrate superior smoothness, plausibility, and cross-category generalization compared with state-of-the-art baselines, including 3D-aware multi-view approaches, while providing extensive ablations to justify the proposed strategies. This approach enables scalable, texture-preserving 3D morphing suitable for visual effects and creative design, reducing the need for laborious alignment and specialized datasets. Future work points to higher-fidelity 3D priors and temporal consistency for extending to more complex 4D content.

Abstract

Textured 3D morphing creates smooth and plausible interpolation sequences between two 3D objects, focusing on transitions in both shape and texture. This is important for creative applications like visual effects in filmmaking. Previous methods rely on establishing point-to-point correspondences and determining smooth deformation trajectories, which inherently restrict them to shape-only morphing on untextured, topologically aligned datasets. This restriction leads to labor-intensive preprocessing and poor generalization. To overcome these challenges, we propose a method for 3D regenerative morphing using a 3D diffusion prior. Unlike previous methods that depend on explicit correspondences and deformations, our method eliminates the additional need for obtaining correspondence and uses the 3D diffusion prior to generate morphing. Specifically, we introduce a 3D diffusion model and interpolate the source and target information at three levels: initial noise, model parameters, and condition features. We then explore an Attention Fusion strategy to generate more smooth morphing sequences. To further improve the plausibility of semantic interpolation and the generated 3D surfaces, we propose two strategies: (a) Token Reordering, where we match approximate tokens based on semantic analysis to guide implicit correspondences in the denoising process of the diffusion model, and (b) Low-Frequency Enhancement, where we enhance low-frequency signals in the tokens to improve the quality of generated surfaces. Experimental results show that our method achieves superior smoothness and plausibility in 3D morphing across diverse cross-category object pairs, offering a novel regenerative method for 3D morphing with textured representations.

Textured 3D Regenerative Morphing with 3D Diffusion Prior

TL;DR

The paper tackles textured 3D morphing across diverse object categories without relying on explicit point-to-point correspondences. It introduces a regenerative morphing pipeline built on a generic 3D diffusion prior (Gaussian Anything), interpolating source/target information at three levels (initial noises, LoRA model parameters, and CLIP-conditioned features) and refining results through Attention Fusion, Token Reordering, and Low-Frequency Enhancement. The authors demonstrate superior smoothness, plausibility, and cross-category generalization compared with state-of-the-art baselines, including 3D-aware multi-view approaches, while providing extensive ablations to justify the proposed strategies. This approach enables scalable, texture-preserving 3D morphing suitable for visual effects and creative design, reducing the need for laborious alignment and specialized datasets. Future work points to higher-fidelity 3D priors and temporal consistency for extending to more complex 4D content.

Abstract

Textured 3D morphing creates smooth and plausible interpolation sequences between two 3D objects, focusing on transitions in both shape and texture. This is important for creative applications like visual effects in filmmaking. Previous methods rely on establishing point-to-point correspondences and determining smooth deformation trajectories, which inherently restrict them to shape-only morphing on untextured, topologically aligned datasets. This restriction leads to labor-intensive preprocessing and poor generalization. To overcome these challenges, we propose a method for 3D regenerative morphing using a 3D diffusion prior. Unlike previous methods that depend on explicit correspondences and deformations, our method eliminates the additional need for obtaining correspondence and uses the 3D diffusion prior to generate morphing. Specifically, we introduce a 3D diffusion model and interpolate the source and target information at three levels: initial noise, model parameters, and condition features. We then explore an Attention Fusion strategy to generate more smooth morphing sequences. To further improve the plausibility of semantic interpolation and the generated 3D surfaces, we propose two strategies: (a) Token Reordering, where we match approximate tokens based on semantic analysis to guide implicit correspondences in the denoising process of the diffusion model, and (b) Low-Frequency Enhancement, where we enhance low-frequency signals in the tokens to improve the quality of generated surfaces. Experimental results show that our method achieves superior smoothness and plausibility in 3D morphing across diverse cross-category object pairs, offering a novel regenerative method for 3D morphing with textured representations.

Paper Structure

This paper contains 52 sections, 4 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: The framework of our method. The 3D diffusion prior is a two-stage (geometry & texture) generation model. Beyond basic interpolation, Attention Fusion is explored to improve smoothness, while Token Reordering and Low-Frequency Enhancement are proposed to improve plausibility.
  • Figure 2: The token distances between tokens at the same position in the sequence. A 3D representation is scaled to generate a perfectly aligned version with the same random seeds, ensuring tokens at the identical sequence positions are semantically aligned. During denoising, the semantic distance between tokens at the identical positions increases.
  • Figure 3: The point distances between token-distance-closest points. Different random seeds are used to generate varied initial noises, meaning tokens at the identical sequence positions do not correspond to the same 3D points. By extracting the two closest tokens and their corresponding points in the final point cloud, we computed the 3D distance between the paired points.
  • Figure 4: The changes of high and low frequency signals in the diffusion model during the denoising process. Visualizing signal amplitudes at different time steps reveals that low-frequency noise varies less than high-frequency noise, with smaller gaps across denoising time steps.
  • Figure 5: Qualitative comparisons of different methods from tasks, morphing tricks, and generative priors. More video results can be found https://anonymous-888.github.io/siggraph25/.
  • ...and 7 more figures