Kontinuous Kontext: Continuous Strength Control for Instruction-based Image Editing
Rishubh Parihar, Or Patashnik, Daniil Ostashev, R. Venkatesh Babu, Daniel Cohen-Or, Kuan-Chieh Wang
TL;DR
Kontinuous Kontext tackles the lack of fine-grained control in instruction-driven image editing by introducing a scalar strength input $s \in [0,1]$ and a lightweight Strength Projector that maps $s$ and the edit instruction into modulation-space offsets within Flux Kontext. A synthetic data pipeline samples $(x,e,s,y_s)$ using LVLM-generated instructions, full-strength Flux Kontext edits, and diffusion-based intermediate morphs, followed by a rigorous filtering process to ensure smooth edit trajectories. Empirical results show the method yields smooth, diverse, and faithful edits across global, local, and geometric changes, and it generalizes to unseen attributes better than interpolation-based or attribute-specific baselines, while maintaining identity at low strengths. The approach offers a unified, attribute-agnostic framework for interactive, continuous image editing with broad practical implications for design, accessibility, and education, and opens avenues for extending continuous control to other editing modalities.
Abstract
Instruction-based image editing offers a powerful and intuitive way to manipulate images through natural language. Yet, relying solely on text instructions limits fine-grained control over the extent of edits. We introduce Kontinuous Kontext, an instruction-driven editing model that provides a new dimension of control over edit strength, enabling users to adjust edits gradually from no change to a fully realized result in a smooth and continuous manner. Kontinuous Kontext extends a state-of-the-art image editing model to accept an additional input, a scalar edit strength which is then paired with the edit instruction, enabling explicit control over the extent of the edit. To inject this scalar information, we train a lightweight projector network that maps the input scalar and the edit instruction to coefficients in the model's modulation space. For training our model, we synthesize a diverse dataset of image-edit-instruction-strength quadruplets using existing generative models, followed by a filtering stage to ensure quality and consistency. Kontinuous Kontext provides a unified approach for fine-grained control over edit strength for instruction driven editing from subtle to strong across diverse operations such as stylization, attribute, material, background, and shape changes, without requiring attribute-specific training.
