Inversion-by-Inversion: Exemplar-based Sketch-to-Photo Synthesis via Stochastic Differential Equations without Training
Ximing Xing, Chuang Wang, Haitao Zhou, Zhihao Hu, Chongxuan Li, Dong Xu, Qian Yu
TL;DR
This work tackles the problem of generating photo-realistic images from sketches while faithfully transferring appearance from exemplars. It introduces a training-free two-stage diffusion-based framework, Inversion-by-Inversion, comprising shape-enhancing inversion to enforce sketch geometry and full-control inversion to graft exemplar appearance, guided by a shape-energy and an appearance-energy function. The method achieves state-of-the-art quantitative and qualitative results on exemplar-based sketch-to-photo tasks, demonstrating robustness to style exemplars, stroke inputs, and freehand sketches without task-specific retraining. The approach promises practical impact for controllable image synthesis in AIGC applications by enabling flexible, exemplar-guided editing directly from sketches.
Abstract
Exemplar-based sketch-to-photo synthesis allows users to generate photo-realistic images based on sketches. Recently, diffusion-based methods have achieved impressive performance on image generation tasks, enabling highly-flexible control through text-driven generation or energy functions. However, generating photo-realistic images with color and texture from sketch images remains challenging for diffusion models. Sketches typically consist of only a few strokes, with most regions left blank, making it difficult for diffusion-based methods to produce photo-realistic images. In this work, we propose a two-stage method named ``Inversion-by-Inversion" for exemplar-based sketch-to-photo synthesis. This approach includes shape-enhancing inversion and full-control inversion. During the shape-enhancing inversion process, an uncolored photo is generated with the guidance of a shape-energy function. This step is essential to ensure control over the shape of the generated photo. In the full-control inversion process, we propose an appearance-energy function to control the color and texture of the final generated photo.Importantly, our Inversion-by-Inversion pipeline is training-free and can accept different types of exemplars for color and texture control. We conducted extensive experiments to evaluate our proposed method, and the results demonstrate its effectiveness. The code and project can be found at https://ximinng.github.io/inversion-by-inversion-project/.
