Table of Contents
Fetching ...

End-to-End Fine-Tuning of 3D Texture Generation using Differentiable Rewards

AmirHossein Zamani, Tianhao Xie, Amir G. Aghdam, Tiberiu Popa, Eugene Belilovsky

TL;DR

The paper tackles 3D texture generation by embedding differentiable reward signals into an end-to-end 3D texture pipeline, eliminating RL while achieving geometry-aware texture synthesis. It introduces five geometry-aware rewards and employs differentiable rendering to backpropagate preferences through both geometry and appearance, enabling precise control over texture alignment with 3D structure. To make training feasible, it leverages LoRA and gradient checkpointing for memory efficiency and stability. Across qualitative, quantitative, and user studies, the method outperforms state-of-the-art baselines, demonstrating superior texture quality, geometric coherence, and user-preferred results. The work paves the way for interactive, geometry-consistent 3D content generation and can extend to joint optimization of geometry and texture.

Abstract

While recent 3D generative models can produce high-quality texture images, they often fail to capture human preferences or meet task-specific requirements. Moreover, a core challenge in the 3D texture generation domain is that most existing approaches rely on repeated calls to 2D text-to-image generative models, which lack an inherent understanding of the 3D structure of the input 3D mesh object. To alleviate these issues, we propose an end-to-end differentiable, reinforcement-learning-free framework that embeds human feedback, expressed as differentiable reward functions, directly into the 3D texture synthesis pipeline. By back-propagating preference signals through both geometric and appearance modules of the proposed framework, our method generates textures that respect the 3D geometry structure and align with desired criteria. To demonstrate its versatility, we introduce three novel geometry-aware reward functions, which offer a more controllable and interpretable pathway for creating high-quality 3D content from natural language. By conducting qualitative, quantitative, and user-preference evaluations against state-of-the-art methods, we demonstrate that our proposed strategy consistently outperforms existing approaches. Our implementation code is publicly available at: https://github.com/AHHHZ975/Differentiable-Texture-Learning

End-to-End Fine-Tuning of 3D Texture Generation using Differentiable Rewards

TL;DR

The paper tackles 3D texture generation by embedding differentiable reward signals into an end-to-end 3D texture pipeline, eliminating RL while achieving geometry-aware texture synthesis. It introduces five geometry-aware rewards and employs differentiable rendering to backpropagate preferences through both geometry and appearance, enabling precise control over texture alignment with 3D structure. To make training feasible, it leverages LoRA and gradient checkpointing for memory efficiency and stability. Across qualitative, quantitative, and user studies, the method outperforms state-of-the-art baselines, demonstrating superior texture quality, geometric coherence, and user-preferred results. The work paves the way for interactive, geometry-consistent 3D content generation and can extend to joint optimization of geometry and texture.

Abstract

While recent 3D generative models can produce high-quality texture images, they often fail to capture human preferences or meet task-specific requirements. Moreover, a core challenge in the 3D texture generation domain is that most existing approaches rely on repeated calls to 2D text-to-image generative models, which lack an inherent understanding of the 3D structure of the input 3D mesh object. To alleviate these issues, we propose an end-to-end differentiable, reinforcement-learning-free framework that embeds human feedback, expressed as differentiable reward functions, directly into the 3D texture synthesis pipeline. By back-propagating preference signals through both geometric and appearance modules of the proposed framework, our method generates textures that respect the 3D geometry structure and align with desired criteria. To demonstrate its versatility, we introduce three novel geometry-aware reward functions, which offer a more controllable and interpretable pathway for creating high-quality 3D content from natural language. By conducting qualitative, quantitative, and user-preference evaluations against state-of-the-art methods, we demonstrate that our proposed strategy consistently outperforms existing approaches. Our implementation code is publicly available at: https://github.com/AHHHZ975/Differentiable-Texture-Learning

Paper Structure

This paper contains 27 sections, 31 equations, 17 figures, 4 tables, 1 algorithm.

Figures (17)

  • Figure 1: Texture images generated before and after reward fine‑tuning on different 3D mesh objects using different rewards. Each column shows the text prompt at the top and the corresponding reward function used for texture generation at the bottom. Texture images produced by our method consistently outperform the pre‑fine‑tuning baseline (InTeX InTex) across all rewards and experiments.
  • Figure 2: An overview of the proposed training process, consisting of two main stages: (i) texture generation (\ref{['TextureGenerationStep']}), where a latent diffusion model generates high-quality images from textual prompts. Combined with differentiable rendering and 3D vision techniques, this step produces realistic textures for 3D objects. (ii) texture reward learning (\ref{['TextureEnahncementStep']}), where an end-to-end differentiable pipeline fine-tunes the pre-trained text-to-image diffusion model by maximizing a differentiable reward function $r$. Gradients are back-propagated through the entire 3D generative pipeline, making the process inherently geometry-aware. To demonstrate the method’s effectiveness in producing textures aligned with 3D geometry, we introduce five novel geometry-aware reward functions, detailed in \ref{['GeometryAwareRewardDesign']} and \ref{['GeometryGuidedTextureColorizationReward']}.
  • Figure 3: Qualitative results of the geometry–texture alignment experiment on a rabbit (bunny) mesh. For both the pre‑fine‑tuning and post‑fine‑tuning cases, we render the bunny from several single viewpoints. In the bottom two rows, we visualize 2D‑projected minimum curvature vectors (blue) alongside texture gradient vectors (red) to highlight alignment. After fine‑tuning with our alignment reward, the texture patterns conform much more closely to the mesh’s curvature directions.
  • Figure 4: Visualization of three main stages in the texture generation: (i) rendering, render the object from a camera viewpoint using a differentiable renderer and extract a rendering buffer, (ii) depth-aware painting, given a text prompt, each viewpoint is painted using a depth-aware text-to-image diffusion model, guided by a pre-trained ControlNet that provides depth information. This ensures the generated textures align with both the text and depth (geometry) information, and (iii) update the final texture. We repeat this process iteratively across all camera viewpoints until the full 3D surface is painted.
  • Figure 5: Visualization of rendering in the texture generation step. For each camera viewpoint, we render the object using a differentiable renderer and extract rendering buffer including painted viewpoint image, depth maps, normal maps, and UV coordinates. Then, using the normal map, obtained from the differentiable renderer, we compute the view direction cosine, and then generate three regions (masks), explained above, for each viewpoint: $M_{generate}, M_{refine}$ and $M_{keep}$. These three regions will serve as input to the next step of the texturing process to enforce the consistency in the output texture.
  • ...and 12 more figures