Table of Contents
Fetching ...

InteractEdit: Zero-Shot Editing of Human-Object Interactions in Images

Jiun Tian Hoe, Weipeng Hu, Wei Zhou, Chao Xie, Ziwei Wang, Chee Seng Chan, Xudong Jiang, Yap-Peng Tan

TL;DR

This work introduces InteractEdit, a zero-shot framework for editing existing human–object interactions in images while preserving the identities of the subject and object. It achieves this by disassembling HOI into subject, object, and background cues, regularizing inversion with Low-Rank Adaptation (LoRA), and applying selective fine-tuning to retain pretrained interaction priors. The authors also propose IEBench, a comprehensive benchmark for evaluating HOI editing in terms of editability and identity preservation. Across extensive qualitative, quantitative, ablation, and user studies, InteractEdit demonstrates superior performance over state-of-the-art baselines, establishing a new baseline for HOI editing research and enabling practical applications in content creation and visualization.

Abstract

This paper presents InteractEdit, a novel framework for zero-shot Human-Object Interaction (HOI) editing, addressing the challenging task of transforming an existing interaction in an image into a new, desired interaction while preserving the identities of the subject and object. Unlike simpler image editing scenarios such as attribute manipulation, object replacement or style transfer, HOI editing involves complex spatial, contextual, and relational dependencies inherent in humans-objects interactions. Existing methods often overfit to the source image structure, limiting their ability to adapt to the substantial structural modifications demanded by new interactions. To address this, InteractEdit decomposes each scene into subject, object, and background components, then employs Low-Rank Adaptation (LoRA) and selective fine-tuning to preserve pretrained interaction priors while learning the visual identity of the source image. This regularization strategy effectively balances interaction edits with identity consistency. We further introduce IEBench, the most comprehensive benchmark for HOI editing, which evaluates both interaction editing and identity preservation. Our extensive experiments show that InteractEdit significantly outperforms existing methods, establishing a strong baseline for future HOI editing research and unlocking new possibilities for creative and practical applications. Code will be released upon publication.

InteractEdit: Zero-Shot Editing of Human-Object Interactions in Images

TL;DR

This work introduces InteractEdit, a zero-shot framework for editing existing human–object interactions in images while preserving the identities of the subject and object. It achieves this by disassembling HOI into subject, object, and background cues, regularizing inversion with Low-Rank Adaptation (LoRA), and applying selective fine-tuning to retain pretrained interaction priors. The authors also propose IEBench, a comprehensive benchmark for evaluating HOI editing in terms of editability and identity preservation. Across extensive qualitative, quantitative, ablation, and user studies, InteractEdit demonstrates superior performance over state-of-the-art baselines, establishing a new baseline for HOI editing research and enabling practical applications in content creation and visualization.

Abstract

This paper presents InteractEdit, a novel framework for zero-shot Human-Object Interaction (HOI) editing, addressing the challenging task of transforming an existing interaction in an image into a new, desired interaction while preserving the identities of the subject and object. Unlike simpler image editing scenarios such as attribute manipulation, object replacement or style transfer, HOI editing involves complex spatial, contextual, and relational dependencies inherent in humans-objects interactions. Existing methods often overfit to the source image structure, limiting their ability to adapt to the substantial structural modifications demanded by new interactions. To address this, InteractEdit decomposes each scene into subject, object, and background components, then employs Low-Rank Adaptation (LoRA) and selective fine-tuning to preserve pretrained interaction priors while learning the visual identity of the source image. This regularization strategy effectively balances interaction edits with identity consistency. We further introduce IEBench, the most comprehensive benchmark for HOI editing, which evaluates both interaction editing and identity preservation. Our extensive experiments show that InteractEdit significantly outperforms existing methods, establishing a strong baseline for future HOI editing research and unlocking new possibilities for creative and practical applications. Code will be released upon publication.

Paper Structure

This paper contains 25 sections, 12 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: Sample results of editing Human-Object Interaction in the source image (left). Existing methods overly preserve the structure, making interaction edits ineffective. Our method focuses on modifying interactions while maintaining the subject and object identity.
  • Figure 2: Overview of the InteractEdit framework. HOI components are disassembled into subject, object, and background clues during inversion (\ref{['subsec:disassemble-hoi']}). LoRA regularization enables non-rigid edits by capturing essential attributes while ignoring fine-grained structural details (\ref{['subsec:lora']}). Selective fine-tuning preserves interaction priors while adapting to the source image’s identity (\ref{['subsec:selective-training']}). Editing reassembles these components with the target interaction, using trained LoRA weights to guide the diffusion model (\ref{['subsec:hoi-editing']}).
  • Figure 3: User Preferences
  • Figure 4: User preferences for each interaction edit.
  • Figure 5: HOI editability for different ranks of K and V.
  • ...and 1 more figures