One-Shot Structure-Aware Stylized Image Synthesis
Hansam Cho, Jonghyun Lee, Seunggyu Chang, Yonghyun Jeong
TL;DR
OSASIS tackles the challenge of one-shot stylization by explicitly separating structure from semantics within a diffusion-based framework. It leverages a structural latent code $\mathbf{x}_{\mathbf{t}_0}$ and a semantic latent code $\mathbf{z}_{\mathrm{sem}}$, enhanced by a structure-preserving network and CLIP directional losses to bridge input and style domains, enabling robust structure preservation even with out-of-domain references and enabling text-driven manipulation. The approach demonstrates superior structure fidelity and style transfer compared to baselines across multiple datasets, while also enabling stylization from rare input structures and supporting real-time content/style mixing. The work introduces practical benefits for diffusion-based stylization, offering improved robustness and controllability at the cost of longer training times and per-style training, with future work aimed at efficiency and generalization across styles.
Abstract
While GAN-based models have been successful in image stylization tasks, they often struggle with structure preservation while stylizing a wide range of input images. Recently, diffusion models have been adopted for image stylization but still lack the capability to maintain the original quality of input images. Building on this, we propose OSASIS: a novel one-shot stylization method that is robust in structure preservation. We show that OSASIS is able to effectively disentangle the semantics from the structure of an image, allowing it to control the level of content and style implemented to a given input. We apply OSASIS to various experimental settings, including stylization with out-of-domain reference images and stylization with text-driven manipulation. Results show that OSASIS outperforms other stylization methods, especially for input images that were rarely encountered during training, providing a promising solution to stylization via diffusion models.
