Style3D: Attention-guided Multi-view Style Transfer for 3D Object Generation

Bingjie Song; Xin Huang; Ruting Xie; Xue Wang; Qing Wang

Style3D: Attention-guided Multi-view Style Transfer for 3D Object Generation

Bingjie Song, Xin Huang, Ruting Xie, Xue Wang, Qing Wang

TL;DR

Style3D tackles the problem of instantly stylizing 3D objects from a content-style image pair without style-specific training. It decomposes the task into two stages: Multi-View Dual-Feature Alignment, which uses a MultiFusion Attention mechanism to anchor geometry with content queries while injecting style via keys/values; and Sparse-view Spatial Reconstruction, which reconstructs a coherent 3D object from stylized multi-view features using a triplane/SDF-based representation and FlexiCubes mesh extraction. The method achieves high stylistic fidelity and geometric coherence across views, outperforming baselines in realism, coherence, and CLIP-based alignment, while offering significantly faster generation (about 30 seconds per object). These results demonstrate the practical potential for rapid, scalable creation of style-consistent 3D assets in design, gaming, and VR applications, without the heavy retraining typical of prior approaches.

Abstract

We present Style3D, a novel approach for generating stylized 3D objects from a content image and a style image. Unlike most previous methods that require case- or style-specific training, Style3D supports instant 3D object stylization. Our key insight is that 3D object stylization can be decomposed into two interconnected processes: multi-view dual-feature alignment and sparse-view spatial reconstruction. We introduce MultiFusion Attention, an attention-guided technique to achieve multi-view stylization from the content-style pair. Specifically, the query features from the content image preserve geometric consistency across multiple views, while the key and value features from the style image are used to guide the stylistic transfer. This dual-feature alignment ensures that spatial coherence and stylistic fidelity are maintained across multi-view images. Finally, a large 3D reconstruction model is introduced to generate coherent stylized 3D objects. By establishing an interplay between structural and stylistic features across multiple views, our approach enables a holistic 3D stylization process. Extensive experiments demonstrate that Style3D offers a more flexible and scalable solution for generating style-consistent 3D assets, surpassing existing methods in both computational efficiency and visual quality.

Style3D: Attention-guided Multi-view Style Transfer for 3D Object Generation

TL;DR

Abstract

Style3D: Attention-guided Multi-view Style Transfer for 3D Object Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)