LoRA of Change: Learning to Generate LoRA for the Editing Instruction from A Single Before-After Image Pair

Xue Song; Jiequan Cui; Hanwang Zhang; Jiaxin Shi; Jingjing Chen; Chi Zhang; Yu-Gang Jiang

LoRA of Change: Learning to Generate LoRA for the Editing Instruction from A Single Before-After Image Pair

Xue Song, Jiequan Cui, Hanwang Zhang, Jiaxin Shi, Jingjing Chen, Chi Zhang, Yu-Gang Jiang

TL;DR

LoC tackles the problem of ambiguous text prompts in image editing by using before-after visual instructions to capture user intent. It introduces a dynamic LoRA generation mechanism (LoC) via a hypernetwork that encodes the change between a before and after image and injects it into a frozen editing model, along with LoRA Reverse to regularize learning from paired data. The method demonstrates broad support for editing types and yields high-quality results on SEED-Data-Edit and MagicBrush with real-time inference. This work offers interpretable, reusable instruction-specific LoRAs for real-world visual editing while acknowledging potential misuse and the need for safeguards.

Abstract

In this paper, we propose the LoRA of Change (LoC) framework for image editing with visual instructions, i.e., before-after image pairs. Compared to the ambiguities, insufficient specificity, and diverse interpretations of natural language, visual instructions can accurately reflect users' intent. Building on the success of LoRA in text-based image editing and generation, we dynamically learn an instruction-specific LoRA to encode the "change" in a before-after image pair, enhancing the interpretability and reusability of our model. Furthermore, generalizable models for image editing with visual instructions typically require quad data, i.e., a before-after image pair, along with query and target images. Due to the scarcity of such quad data, existing models are limited to a narrow range of visual instructions. To overcome this limitation, we introduce the LoRA Reverse optimization technique, enabling large-scale training with paired data alone. Extensive qualitative and quantitative experiments demonstrate that our model produces high-quality images that align with user intent and support a broad spectrum of real-world visual instructions.

LoRA of Change: Learning to Generate LoRA for the Editing Instruction from A Single Before-After Image Pair

TL;DR

Abstract

LoRA of Change: Learning to Generate LoRA for the Editing Instruction from A Single Before-After Image Pair

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)