ICAS: IP Adapter and ControlNet-based Attention Structure for Multi-Subject Style Transfer Optimization
Fuwei Liu
TL;DR
ICAS addresses multi-subject style transfer under limited data by decoupling style and structure guidance in a diffusion framework. It combines IP-Adapter for style injection with ControlNet for structural conditioning and employs partial fine-tuning of the content injection branch along with a cyclic content embedding strategy, enabling efficient and faithful multi-subject stylization. The approach achieves superior structure preservation, style coherence, and inference efficiency compared with inversion-based and data-hungry baselines, as demonstrated across extensive experiments and user studies. The work offers a practical pathway for real-world multi-subject stylization with limited annotated data and tight computational budgets.
Abstract
Generating multi-subject stylized images remains a significant challenge due to the ambiguity in defining style attributes (e.g., color, texture, atmosphere, and structure) and the difficulty in consistently applying them across multiple subjects. Although recent diffusion-based text-to-image models have achieved remarkable progress, existing methods typically rely on computationally expensive inversion procedures or large-scale stylized datasets. Moreover, these methods often struggle with maintaining multi-subject semantic fidelity and are limited by high inference costs. To address these limitations, we propose ICAS (IP-Adapter and ControlNet-based Attention Structure), a novel framework for efficient and controllable multi-subject style transfer. Instead of full-model tuning, ICAS adaptively fine-tunes only the content injection branch of a pre-trained diffusion model, thereby preserving identity-specific semantics while enhancing style controllability. By combining IP-Adapter for adaptive style injection with ControlNet for structural conditioning, our framework ensures faithful global layout preservation alongside accurate local style synthesis. Furthermore, ICAS introduces a cyclic multi-subject content embedding mechanism, which enables effective style transfer under limited-data settings without the need for extensive stylized corpora. Extensive experiments show that ICAS achieves superior performance in structure preservation, style consistency, and inference efficiency, establishing a new paradigm for multi-subject style transfer in real-world applications.
