Mapping the Mind of an Instruction-based Image Editing using SMILE
Zeinab Dehghani, Koorosh Aslansefat, Adil Khan, Adín Ramírez Rivera, Franky George, Muhammad Khalid
TL;DR
Instruction-based image editing models remain black boxes, limiting transparency in high-stakes domains. We introduce SMILE, a model-agnostic, localized interpretability framework that builds perturbed prompts, maps edits to a 1D distance via Wasserstein-based metrics, and uses a weighted linear surrogate to produce word-level heatmaps. SMILE couples DINOv2 embeddings with bootstrap p-values to assess significance and demonstrates robust accuracy, stability, and fidelity across multiple diffusion models, including Instruct-Pix2Pix, Img2Img-Turbo, and Diffusers-Inpaint. The results indicate that SMILE enhances trust and controllability of text-driven edits, with practical implications for healthcare, autonomous driving, and other precision editing contexts; the work lays a foundation for extending model-agnostic interpretability to broader generative tasks.
Abstract
Despite recent advancements in Instruct-based Image Editing models for generating high-quality images, they are known as black boxes and a significant barrier to transparency and user trust. To solve this issue, we introduce SMILE (Statistical Model-agnostic Interpretability with Local Explanations), a novel model-agnostic for localized interpretability that provides a visual heatmap to clarify the textual elements' influence on image-generating models. We applied our method to various Instruction-based Image Editing models like Pix2Pix, Image2Image-turbo and Diffusers-Inpaint and showed how our model can improve interpretability and reliability. Also, we use stability, accuracy, fidelity, and consistency metrics to evaluate our method. These findings indicate the exciting potential of model-agnostic interpretability for reliability and trustworthiness in critical applications such as healthcare and autonomous driving while encouraging additional investigation into the significance of interpretability in enhancing dependable image editing models.
