AbracADDbra: Touch-Guided Object Addition by Decoupling Placement and Editing Subtasks

Kunal Swami; Raghu Chittersu; Yuvraj Rathore; Rajeev Irny; Shashavali Doodekula; Alok Shukla

AbracADDbra: Touch-Guided Object Addition by Decoupling Placement and Editing Subtasks

Kunal Swami, Raghu Chittersu, Yuvraj Rathore, Rajeev Irny, Shashavali Doodekula, Alok Shukla

TL;DR

This work introduces AbracADDbra, a user-friendly framework that leverages intuitive touch priors to spatially ground succinct instructions for precise placement, and reveals a strong correlation between initial placement accuracy and final edit quality, validating the decoupled approach.

Abstract

Instruction-based object addition is often hindered by the ambiguity of text-only prompts or the tedious nature of mask-based inputs. To address this usability gap, we introduce AbracADDbra, a user-friendly framework that leverages intuitive touch priors to spatially ground succinct instructions for precise placement. Our efficient, decoupled architecture uses a vision-language transformer for touch-guided placement, followed by a diffusion model that jointly generates the object and an instance mask for high-fidelity blending. To facilitate standardized evaluation, we contribute the Touch2Add benchmark for this interactive task. Our extensive evaluations, where our placement model significantly outperforms both random placement and general-purpose VLM baselines, confirm the framework's ability to produce high-fidelity edits. Furthermore, our analysis reveals a strong correlation between initial placement accuracy and final edit quality, validating our decoupled approach. This work thus paves the way for more accessible and efficient creative tools.

AbracADDbra: Touch-Guided Object Addition by Decoupling Placement and Editing Subtasks

TL;DR

Abstract

Paper Structure (17 sections, 1 equation, 7 figures, 6 tables)

This paper contains 17 sections, 1 equation, 7 figures, 6 tables.

Introduction
Related Work
Instruction-based Image Editing.
Instruction-based Object Addition.
Proposed Method
Problem Formulation
Training Dataset Generation
Placement Prediction Model
Architecture
Input Formulation
Output Formulation and Training Objective
Instruction-based Object Addition Model
Results and Discussion
Evaluation Protocol
Main Results
...and 2 more sections

Figures (7)

Figure 1: Main idea of AbracADDbra. We propose a new framework for adding objects to images that lets users provide simple touch input along with instructions, making editing more accurate and user-friendly.
Figure 2: AbracADDbra performs high-fidelity object addition via touch and succinct instructions. Our method combines an intuitive touch prior with a simple, succinct prompt (in green) to achieve precise object addition. For a fair comparison against strong baselines, we provided them with detailed prompts (in blue).
Figure 3: Our automated data generation pipeline. The process includes four stages: (1) Preprocessing: Filtering COCO objects based on size, boundary proximity, and CLIP score. (2) Inpainting: A two-stage process using LaMa lama_inpainting_wacv2022 and Stable Diffusion stablediffusion_cvpr2022. (3) Postprocessing: Applying filtering steps from pbyi_cvpr2025. (4) Caption Generation: Using GLaMM glamm_cvpr2024 and an LLM to create the instruction and placement reasoning.
Figure 4: The detailed architecture of our method. Inference scenario is shown and VAE encoder and decoder are omitted.
Figure 5: Diversity statistics of our Touch2Add dataset. We also compare the average scene complexity with the MagicBrush object addition subset.
...and 2 more figures

AbracADDbra: Touch-Guided Object Addition by Decoupling Placement and Editing Subtasks

TL;DR

Abstract

AbracADDbra: Touch-Guided Object Addition by Decoupling Placement and Editing Subtasks

Authors

TL;DR

Abstract

Table of Contents

Figures (7)