Table of Contents
Fetching ...

TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation

Sunjae Yoon, Gwanhyeong Koo, Younghwan Lee, Chang D. Yoo

TL;DR

Diffusion-based human image animation often falters when the reference image and target poses are compositionally misaligned, leading to inconsistent fidelity across frames. The paper introduces Test-time Procrustes Calibration (TPC), a plug-and-play method that calibrates the reference conditioning in diffusion models at test time by performing Procrustes Warping to align shapes and Iterative Propagation to stabilize temporal features. TPC is model-agnostic and requires no additional training, demonstrated across multiple baselines on TikTok and TED-talks datasets and on unseen real-world data. The approach substantially improves image quality and temporal continuity under misalignment, enabling more robust, real-world diffusion-based human image animation.

Abstract

Human image animation aims to generate a human motion video from the inputs of a reference human image and a target motion video. Current diffusion-based image animation systems exhibit high precision in transferring human identity into targeted motion, yet they still exhibit irregular quality in their outputs. Their optimal precision is achieved only when the physical compositions (i.e., scale and rotation) of the human shapes in the reference image and target pose frame are aligned. In the absence of such alignment, there is a noticeable decline in fidelity and consistency. Especially, in real-world environments, this compositional misalignment commonly occurs, posing significant challenges to the practical usage of current systems. To this end, we propose Test-time Procrustes Calibration (TPC), which enhances the robustness of diffusion-based image animation systems by maintaining optimal performance even when faced with compositional misalignment, effectively addressing real-world scenarios. The TPC provides a calibrated reference image for the diffusion model, enhancing its capability to understand the correspondence between human shapes in the reference and target images. Our method is simple and can be applied to any diffusion-based image animation system in a model-agnostic manner, improving the effectiveness at test time without additional training.

TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation

TL;DR

Diffusion-based human image animation often falters when the reference image and target poses are compositionally misaligned, leading to inconsistent fidelity across frames. The paper introduces Test-time Procrustes Calibration (TPC), a plug-and-play method that calibrates the reference conditioning in diffusion models at test time by performing Procrustes Warping to align shapes and Iterative Propagation to stabilize temporal features. TPC is model-agnostic and requires no additional training, demonstrated across multiple baselines on TikTok and TED-talks datasets and on unseen real-world data. The approach substantially improves image quality and temporal continuity under misalignment, enabling more robust, real-world diffusion-based human image animation.

Abstract

Human image animation aims to generate a human motion video from the inputs of a reference human image and a target motion video. Current diffusion-based image animation systems exhibit high precision in transferring human identity into targeted motion, yet they still exhibit irregular quality in their outputs. Their optimal precision is achieved only when the physical compositions (i.e., scale and rotation) of the human shapes in the reference image and target pose frame are aligned. In the absence of such alignment, there is a noticeable decline in fidelity and consistency. Especially, in real-world environments, this compositional misalignment commonly occurs, posing significant challenges to the practical usage of current systems. To this end, we propose Test-time Procrustes Calibration (TPC), which enhances the robustness of diffusion-based image animation systems by maintaining optimal performance even when faced with compositional misalignment, effectively addressing real-world scenarios. The TPC provides a calibrated reference image for the diffusion model, enhancing its capability to understand the correspondence between human shapes in the reference and target images. Our method is simple and can be applied to any diffusion-based image animation system in a model-agnostic manner, improving the effectiveness at test time without additional training.

Paper Structure

This paper contains 25 sections, 4 equations, 17 figures, 3 tables, 1 algorithm.

Figures (17)

  • Figure 1: Illustration of compositional misalignment: (a) Results of current human image animation models xu2023magicanimatewang2023disco on samples in compositional misalignment of human shapes between reference and target. (b) Sensitivity analysis according to variation of compositional misalignment by scaling and rotating (MA: MagicAnimate). Best viewed with zoom.
  • Figure 2: Attention maps on the reference image corresponding to the target human shape (e.g., shoulder at blue point) according to denoising.
  • Figure 2: Ablation studies on transformation methods for reference image calibration and iterative propagation (IP) on TED-talks and TikTok. (validation splits, average score compositional alignment/misalignment).
  • Figure 3: Illustration of (a) current diffusion-based human image animation systems and (b) Test-time Procrustes Calibration (TPC) on top of these systems. The TPC can be applied to diffusion-based models in a model-agnostic manner, enhancing the fidelity and consistency of the output video.
  • Figure 4: Conceptual illustration of the effectiveness of TPC in terms of style and shape variation.
  • ...and 12 more figures