A Training-Free Style-Personalization via SVD-Based Feature Decomposition
Kyoungmin Lee, Jihun Park, Jongmin Gim, Wonhyeok Choi, Kyumin Hwang, Jaeyeul Kim, Sunghoon Im
TL;DR
This work tackles fast, training-free style personalization for text- and image-guided generation by analyzing a scale-wise autoregressive backbone (Infinity). It identifies a pivotal early step where the dominant singular values of an internal feature capture style and introduces two lightweight modules—Principal Feature Blending and Structural Attention Correction—to inject style and stabilize structure without training. Through extensive experiments, the approach achieves competitive style and prompt fidelity while significantly reducing inference time compared to fine-tuned baselines, and it generalizes across model scales. The proposed method offers practical benefits for real-time, user-friendly style personalization with broad applicability to scale-wise autoregressive generation frameworks.
Abstract
We present a training-free framework for style-personalized image generation that operates during inference using a scale-wise autoregressive model. Our method generates a stylized image guided by a single reference style while preserving semantic consistency and mitigating content leakage. Through a detailed step-wise analysis of the generation process, we identify a pivotal step where the dominant singular values of the internal feature encode style-related components. Building upon this insight, we introduce two lightweight control modules: Principal Feature Blending, which enables precise modulation of style through SVD-based feature reconstruction, and Structural Attention Correction, which stabilizes structural consistency by leveraging content-guided attention correction across fine stages. Without any additional training, extensive experiments demonstrate that our method achieves competitive style fidelity and prompt fidelity compared to fine-tuned baselines, while offering faster inference and greater deployment flexibility.
