Jigsaw3D: Disentangled 3D Style Transfer via Patch Shuffling and Masking
Yuteng Ye, Zheng Zhang, Qinchuan Zhang, Di Wang, Youjia Zhang, Wenxiao Zhang, Wei Yang, Yuan Liu
TL;DR
Jigsaw3D tackles the challenge of transferring 2D stylistic cues to 3D textures while maintaining multi-view consistency and geometric fidelity. It introduces a jigsaw-based style-reference construction to disentangle style from content, enabling supervised training of a multi-view diffusion model that uses geometry cues and reference-to-view cross-attention to instill style consistently across views. The method includes a 3D style baking step to fuse stylized views into a seamless UV texture, and it demonstrates strong style fidelity, cross-view coherence, and versatility across partial stylization, multi-object scenes, and tileable textures. This approach offers scalable, fast 3D stylization without per-asset optimization and broad applicability to practical content creation workflows.
Abstract
Controllable 3D style transfer seeks to restyle a 3D asset so that its textures match a reference image while preserving the integrity and multi-view consistency. The prevalent methods either rely on direct reference style token injection or score-distillation from 2D diffusion models, which incurs heavy per-scene optimization and often entangles style with semantic content. We introduce Jigsaw3D, a multi-view diffusion based pipeline that decouples style from content and enables fast, view-consistent stylization. Our key idea is to leverage the jigsaw operation - spatial shuffling and random masking of reference patches - to suppress object semantics and isolate stylistic statistics (color palettes, strokes, textures). We integrate these style cues into a multi-view diffusion model via reference-to-view cross-attention, producing view-consistent stylized renderings conditioned on the input mesh. The renders are then style-baked onto the surface to yield seamless textures. Across standard 3D stylization benchmarks, Jigsaw3D achieves high style fidelity and multi-view consistency with substantially lower latency, and generalizes to masked partial reference stylization, multi-object scene styling, and tileable texture generation. Project page is available at: https://babahui.github.io/jigsaw3D.github.io/
