3DStyleGLIP: Part-Tailored Text-Guided 3D Neural Stylization
SeungJeh Chung, JooHyun Park, HyeongYeop Kang
TL;DR
3DStyleGLIP enables fine-grained, text-guided stylization of 3D objects by grounding and manipulating individual mesh parts in GLIP's embedding space. It jointly learns part localization and appearance control through SVBRDF, normals, and lighting modeled by neural fields and spherical Gaussians, guided by text prompts that couple style and part phrases. The method introduces a part-level style loss in GLIP space and an optional CLIP-based alternating objective with multi-view fine-tuning, achieving high-quality, part-specific stylizations with robust performance and stability. Experimental results on diverse meshes show superior part-tailored results and user-perceived alignment to prompts compared with baselines, underscoring practical value for customizable 3D content creation.
Abstract
3D stylization, the application of specific styles to three-dimensional objects, offers substantial commercial potential by enabling the creation of uniquely styled 3D objects tailored to diverse scenes. Recent advancements in artificial intelligence and text-driven manipulation methods have made the stylization process increasingly intuitive and automated. While these methods reduce human costs by minimizing reliance on manual labor and expertise, they predominantly focus on holistic stylization, neglecting the application of desired styles to individual components of a 3D object. This limitation restricts the fine-grained controllability. To address this gap, we introduce 3DStyleGLIP, a novel framework specifically designed for text-driven, part-tailored 3D stylization. Given a 3D mesh and a text prompt, 3DStyleGLIP utilizes the vision-language embedding space of the Grounded Language-Image Pre-training (GLIP) model to localize individual parts of the 3D mesh and modify their appearance to match the styles specified in the text prompt. 3DStyleGLIP effectively integrates part localization and stylization guidance within GLIP's shared embedding space through an end-to-end process, enabled by part-level style loss and two complementary learning techniques. This neural methodology meets the user's need for fine-grained style editing and delivers high-quality part-specific stylization results, opening new possibilities for customization and flexibility in 3D content creation. Our code and results are available at https://github.com/sj978/3DStyleGLIP.
