PRIMEdit: Probability Redistribution for Instance-aware Multi-object Video Editing with Benchmark Dataset
Samuel Teodoro, Agus Gunawan, Soo Ye Kim, Jihyong Oh, Munchurl Kim
TL;DR
PRIMEdit tackles the challenge of localized, instance-aware multi-object video editing in a zero-shot setting by introducing two novel components: Instance-centric Probability Redistribution (IPR) and Disentangled Multi-instance Sampling (DMS). IPR provides precise spatial control by redistributing cross-attention probabilities to confine edits within instance masks, while DMS decouples and harmonizes multiple instance edits through series and parallel sampling with latent fusion and re-inversion. To evaluate locality and leakage, the authors introduce the MIVE dataset and the Cross-Instance Accuracy (CIA) score, demonstrating significant improvements in editing faithfulness, temporal consistency, and leakage reduction over state-of-the-art methods. The work also shows robustness across varying instance sizes and numbers, and provides extensive ablations and user studies, highlighting practical applicability and scalability for complex multi-object video edits.
Abstract
Recent AI-based video editing has enabled users to edit videos through simple text prompts, significantly simplifying the editing process. However, recent zero-shot video editing techniques primarily focus on global or single-object edits, which can lead to unintended changes in other parts of the video. When multiple objects require localized edits, existing methods face challenges, such as unfaithful editing, editing leakage, and lack of suitable evaluation datasets and metrics. To overcome these limitations, we propose $\textbf{P}$robability $\textbf{R}$edistribution for $\textbf{I}$nstance-aware $\textbf{M}$ulti-object Video $\textbf{Edit}$ing ($\textbf{PRIMEdit}$). PRIMEdit is a zero-shot framework that introduces two key modules: (i) Instance-centric Probability Redistribution (IPR) to ensure precise localization and faithful editing and (ii) Disentangled Multi-instance Sampling (DMS) to prevent editing leakage. Additionally, we present our new MIVE Dataset for video editing featuring diverse video scenarios, and introduce the Cross-Instance Accuracy (CIA) Score to evaluate editing leakage in multi-instance video editing tasks. Our extensive qualitative, quantitative, and user study evaluations demonstrate that PRIMEdit significantly outperforms recent state-of-the-art methods in terms of editing faithfulness, accuracy, and leakage prevention, setting a new benchmark for multi-instance video editing.
