Table of Contents
Fetching ...

PromptArtisan: Multi-instruction Image Editing in Single Pass with Complete Attention Control

Kunal Swami, Raghu Chittersu, Pranav Adlinge, Rajeev Irny, Shashavali Doodekula, Alok Shukla

TL;DR

PromptArtisan introduces a zero-shot, single-pass multi-instruction image editing framework that handles multiple mask-prompt pairs via a Complete Attention Control Mechanism (CACM). By computing independent prompt embeddings and enforcing cross- and self-attention controls, it achieves precise, mask-localized edits even with overlapping regions, without additional training or test-time optimization. The approach is validated against state-of-the-art IBE methods on the MiE-Bench dataset, showing superior qualitative and quantitative performance, and is complemented by extensive ablations and additional results. This work enables more flexible, efficient, and scalable image editing workflows for diverse user needs.

Abstract

We present PromptArtisan, a groundbreaking approach to multi-instruction image editing that achieves remarkable results in a single pass, eliminating the need for time-consuming iterative refinement. Our method empowers users to provide multiple editing instructions, each associated with a specific mask within the image. This flexibility allows for complex edits involving mask intersections or overlaps, enabling the realization of intricate and nuanced image transformations. PromptArtisan leverages a pre-trained InstructPix2Pix model in conjunction with a novel Complete Attention Control Mechanism (CACM). This mechanism ensures precise adherence to user instructions, granting fine-grained control over the editing process. Furthermore, our approach is zero-shot, requiring no additional training, and boasts improved processing complexity compared to traditional iterative methods. By seamlessly integrating multi-instruction capabilities, single-pass efficiency, and complete attention control, PromptArtisan unlocks new possibilities for creative and efficient image editing workflows, catering to both novice and expert users alike.

PromptArtisan: Multi-instruction Image Editing in Single Pass with Complete Attention Control

TL;DR

PromptArtisan introduces a zero-shot, single-pass multi-instruction image editing framework that handles multiple mask-prompt pairs via a Complete Attention Control Mechanism (CACM). By computing independent prompt embeddings and enforcing cross- and self-attention controls, it achieves precise, mask-localized edits even with overlapping regions, without additional training or test-time optimization. The approach is validated against state-of-the-art IBE methods on the MiE-Bench dataset, showing superior qualitative and quantitative performance, and is complemented by extensive ablations and additional results. This work enables more flexible, efficient, and scalable image editing workflows for diverse user needs.

Abstract

We present PromptArtisan, a groundbreaking approach to multi-instruction image editing that achieves remarkable results in a single pass, eliminating the need for time-consuming iterative refinement. Our method empowers users to provide multiple editing instructions, each associated with a specific mask within the image. This flexibility allows for complex edits involving mask intersections or overlaps, enabling the realization of intricate and nuanced image transformations. PromptArtisan leverages a pre-trained InstructPix2Pix model in conjunction with a novel Complete Attention Control Mechanism (CACM). This mechanism ensures precise adherence to user instructions, granting fine-grained control over the editing process. Furthermore, our approach is zero-shot, requiring no additional training, and boasts improved processing complexity compared to traditional iterative methods. By seamlessly integrating multi-instruction capabilities, single-pass efficiency, and complete attention control, PromptArtisan unlocks new possibilities for creative and efficient image editing workflows, catering to both novice and expert users alike.

Paper Structure

This paper contains 14 sections, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: PromptArtisan enables single pass editing with multiple mask-prompt pairs. Additionally, multiple mask-prompt pairs enable unprecedented flexibility and complex edits with intersections or overlaps of masks.
  • Figure 2: Overall framework of the proposed method PromptArtisan. CACM Mechanism is the backbone of PromptArtisan, enabling users to achieve fine-grained control and achieve complex edits.
  • Figure 3: Qualitative comparison of PromptArtisan with competitors.
  • Figure 4: Qualitative results of our ablation study.
  • Figure 5: Additional capabilities of PromptArtisan.