Table of Contents
Fetching ...

Kontinuous Kontext: Continuous Strength Control for Instruction-based Image Editing

Rishubh Parihar, Or Patashnik, Daniil Ostashev, R. Venkatesh Babu, Daniel Cohen-Or, Kuan-Chieh Wang

TL;DR

Kontinuous Kontext tackles the lack of fine-grained control in instruction-driven image editing by introducing a scalar strength input $s \in [0,1]$ and a lightweight Strength Projector that maps $s$ and the edit instruction into modulation-space offsets within Flux Kontext. A synthetic data pipeline samples $(x,e,s,y_s)$ using LVLM-generated instructions, full-strength Flux Kontext edits, and diffusion-based intermediate morphs, followed by a rigorous filtering process to ensure smooth edit trajectories. Empirical results show the method yields smooth, diverse, and faithful edits across global, local, and geometric changes, and it generalizes to unseen attributes better than interpolation-based or attribute-specific baselines, while maintaining identity at low strengths. The approach offers a unified, attribute-agnostic framework for interactive, continuous image editing with broad practical implications for design, accessibility, and education, and opens avenues for extending continuous control to other editing modalities.

Abstract

Instruction-based image editing offers a powerful and intuitive way to manipulate images through natural language. Yet, relying solely on text instructions limits fine-grained control over the extent of edits. We introduce Kontinuous Kontext, an instruction-driven editing model that provides a new dimension of control over edit strength, enabling users to adjust edits gradually from no change to a fully realized result in a smooth and continuous manner. Kontinuous Kontext extends a state-of-the-art image editing model to accept an additional input, a scalar edit strength which is then paired with the edit instruction, enabling explicit control over the extent of the edit. To inject this scalar information, we train a lightweight projector network that maps the input scalar and the edit instruction to coefficients in the model's modulation space. For training our model, we synthesize a diverse dataset of image-edit-instruction-strength quadruplets using existing generative models, followed by a filtering stage to ensure quality and consistency. Kontinuous Kontext provides a unified approach for fine-grained control over edit strength for instruction driven editing from subtle to strong across diverse operations such as stylization, attribute, material, background, and shape changes, without requiring attribute-specific training.

Kontinuous Kontext: Continuous Strength Control for Instruction-based Image Editing

TL;DR

Kontinuous Kontext tackles the lack of fine-grained control in instruction-driven image editing by introducing a scalar strength input and a lightweight Strength Projector that maps and the edit instruction into modulation-space offsets within Flux Kontext. A synthetic data pipeline samples using LVLM-generated instructions, full-strength Flux Kontext edits, and diffusion-based intermediate morphs, followed by a rigorous filtering process to ensure smooth edit trajectories. Empirical results show the method yields smooth, diverse, and faithful edits across global, local, and geometric changes, and it generalizes to unseen attributes better than interpolation-based or attribute-specific baselines, while maintaining identity at low strengths. The approach offers a unified, attribute-agnostic framework for interactive, continuous image editing with broad practical implications for design, accessibility, and education, and opens avenues for extending continuous control to other editing modalities.

Abstract

Instruction-based image editing offers a powerful and intuitive way to manipulate images through natural language. Yet, relying solely on text instructions limits fine-grained control over the extent of edits. We introduce Kontinuous Kontext, an instruction-driven editing model that provides a new dimension of control over edit strength, enabling users to adjust edits gradually from no change to a fully realized result in a smooth and continuous manner. Kontinuous Kontext extends a state-of-the-art image editing model to accept an additional input, a scalar edit strength which is then paired with the edit instruction, enabling explicit control over the extent of the edit. To inject this scalar information, we train a lightweight projector network that maps the input scalar and the edit instruction to coefficients in the model's modulation space. For training our model, we synthesize a diverse dataset of image-edit-instruction-strength quadruplets using existing generative models, followed by a filtering stage to ensure quality and consistency. Kontinuous Kontext provides a unified approach for fine-grained control over edit strength for instruction driven editing from subtle to strong across diverse operations such as stylization, attribute, material, background, and shape changes, without requiring attribute-specific training.

Paper Structure

This paper contains 47 sections, 11 equations, 24 figures, 3 tables.

Figures (24)

  • Figure 1: Kontinuous Kontext produces smooth edit trajectories across diverse attributes given an image, instruction, and an edit scalar strength. Unlike prior methods that require attribute-specific training, ours is a unified approach to enable fine-grained control.
  • Figure 2: Kontinuous Kontext enables finer control across diverse edits. It can do simultaneous changes in attributes hair color and structure, highly localized changes such as editing the panda's mouth and geometric edits such as changing the size of the car.
  • Figure 3: Data generation. Our pipeline consists of three steps: (a) We generate an edit instruction for each source image using a pretrained VLM, then apply Flux Kontext, an instruction-driven editing model, to produce a full-strength edit. (b) We synthesize intermediate-strength edits using a diffusion-based morphing method cao2025freemorph, which inverts both the source and edited images into the diffusion latent space and interpolates their features. (c) To compensate for inconsistencies in the morphing sequence (Fig. \ref{['fig:data-filtering']}), we filter the samples based on the inversion quality and uniformity of the sequence.
  • Figure 4: Samples from diverse image editing categories in our synthesized dataset. We cover a wide range of global edits, including stylization, reimagination, and environment changes, as well as local edits such as appearance changes, material changes, attribute editing, and object morphing.
  • Figure 5: Generating intermediate images with Freemorph can introduce inconsistencies such as incomplete objects, abrupt jumps, or errors from diffusion inversion. We filter such cases to obtain a clean dataset with smooth trajectories.
  • ...and 19 more figures