Tactile DreamFusion: Exploiting Tactile Sensing for 3D Generation
Ruihan Gao, Kangle Deng, Gengshan Yang, Wenzhen Yuan, Jun-Yan Zhu
TL;DR
This work introduces Tactile DreamFusion, a tactile-augmented pipeline for high-fidelity 3D generation that fuses visual and tactile texture information. By capturing high-resolution tactile normals with GelSight and modeling them in a differentiable 3D texture field, guided by 2D diffusion priors and Texture DreamBooth, the method achieves coherent, fine geometric details and region-wise textures across text-to-3D and image-to-3D tasks. A diffusion-guided refinement framework with multiple loss terms and a multi-part texture scheme yields textures that align with geometry, outperforming state-of-the-art baselines in both texture realism and geometric detail, as demonstrated by user studies. The approach enables customizable and realistic 3D assets and contributes tactile data collection and texture synthesis techniques to 3D generation, with public TouchTexture data and code forthcoming.
Abstract
3D generation methods have shown visually compelling results powered by diffusion image priors. However, they often fail to produce realistic geometric details, resulting in overly smooth surfaces or geometric details inaccurately baked in albedo maps. To address this, we introduce a new method that incorporates touch as an additional modality to improve the geometric details of generated 3D assets. We design a lightweight 3D texture field to synthesize visual and tactile textures, guided by 2D diffusion model priors on both visual and tactile domains. We condition the visual texture generation on high-resolution tactile normals and guide the patch-based tactile texture refinement with a customized TextureDreambooth. We further present a multi-part generation pipeline that enables us to synthesize different textures across various regions. To our knowledge, we are the first to leverage high-resolution tactile sensing to enhance geometric details for 3D generation tasks. We evaluate our method in both text-to-3D and image-to-3D settings. Our experiments demonstrate that our method provides customized and realistic fine geometric textures while maintaining accurate alignment between two modalities of vision and touch.
