SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations

Wenhao Yan; Sheng Ye; Zhuoyi Yang; Jiayan Teng; ZhenHui Dong; Kairui Wen; Xiaotao Gu; Yong-Jin Liu; Jie Tang

SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations

Wenhao Yan, Sheng Ye, Zhuoyi Yang, Jiayan Teng, ZhenHui Dong, Kairui Wen, Xiaotao Gu, Yong-Jin Liu, Jie Tang

TL;DR

The paper tackles the challenge of studio-grade character animation under diverse, cross-domain conditions by introducing SCAIL, which combines a scalable 3D pose representation with cylindrical bones and a full-context pose injection mechanism within a diffusion-transformer framework. A dedicated data pipeline and Studio-Bench benchmark enable rigorous training and evaluation reflective of production requirements. Empirical results show state-of-the-art performance in both self-driven and cross-driven scenarios, with strong handling of multi-person interactions and occlusions. The work advances production-ready character animation by enabling robust motion transfer across varied figures and domains, while acknowledging limitations and ethical considerations surrounding realistic digital content.

Abstract

Achieving character animation that meets studio-grade production standards remains challenging despite recent progress. Existing approaches can transfer motion from a driving video to a reference image, but often fail to preserve structural fidelity and temporal consistency in wild scenarios involving complex motion and cross-identity animations. In this work, we present \textbf{SCAIL} (\textbf{S}tudio-grade \textbf{C}haracter \textbf{A}nimation via \textbf{I}n-context \textbf{L}earning), a framework designed to address these challenges from two key innovations. First, we propose a novel 3D pose representation, providing a more robust and flexible motion signal. Second, we introduce a full-context pose injection mechanism within a diffusion-transformer architecture, enabling effective spatio-temporal reasoning over full motion sequences. To align with studio-level requirements, we develop a curated data pipeline ensuring both diversity and quality, and establish a comprehensive benchmark for systematic evaluation. Experiments show that \textbf{SCAIL} achieves state-of-the-art performance and advances character animation toward studio-grade reliability and realism.

SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations

TL;DR

Abstract

SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (16)