Table of Contents
Fetching ...

LayerAnimate: Layer-level Control for Animation

Yuxue Yang, Lue Fan, Zuzeng Lin, Feng Wang, Zhaoxiang Zhang

TL;DR

LayerAnimate introduces a layer-aware diffusion framework for animation, targeting fine-grained control over individual animation layers. To overcome scarce layer data, it couples Automated Element Segmentation with Motion-based Hierarchical Merging to build a usable layer data pipeline, then integrates layer-level controls (Motion Score, Trajectory, Sketch) via per-layer encoders and ControlNet with cross-attention. Empirical results across I2V and interpolation tasks show superior quality, precise layer control, and strong usability, supported by ablations and a user study. The work enables versatile, composite layer-level animation applications and points to future high-resolution, longer-duration extensions.

Abstract

Traditional animation production decomposes visual elements into discrete layers to enable independent processing for sketching, refining, coloring, and in-betweening. Existing anime generation video methods typically treat animation as a distinct data domain different from real-world videos, lacking fine-grained control at the layer level. To bridge this gap, we introduce LayerAnimate, a novel video diffusion framework with layer-aware architecture that empowers the manipulation of layers through layer-level controls. The development of a layer-aware framework faces a significant data scarcity challenge due to the commercial sensitivity of professional animation assets. To address the limitation, we propose a data curation pipeline featuring Automated Element Segmentation and Motion-based Hierarchical Merging. Through quantitative and qualitative comparisons, and user study, we demonstrate that LayerAnimate outperforms current methods in terms of animation quality, control precision, and usability, making it an effective tool for both professional animators and amateur enthusiasts. This framework opens up new possibilities for layer-level animation applications and creative flexibility. Our code is available at https://layeranimate.github.io.

LayerAnimate: Layer-level Control for Animation

TL;DR

LayerAnimate introduces a layer-aware diffusion framework for animation, targeting fine-grained control over individual animation layers. To overcome scarce layer data, it couples Automated Element Segmentation with Motion-based Hierarchical Merging to build a usable layer data pipeline, then integrates layer-level controls (Motion Score, Trajectory, Sketch) via per-layer encoders and ControlNet with cross-attention. Empirical results across I2V and interpolation tasks show superior quality, precise layer control, and strong usability, supported by ablations and a user study. The work enables versatile, composite layer-level animation applications and points to future high-resolution, longer-duration extensions.

Abstract

Traditional animation production decomposes visual elements into discrete layers to enable independent processing for sketching, refining, coloring, and in-betweening. Existing anime generation video methods typically treat animation as a distinct data domain different from real-world videos, lacking fine-grained control at the layer level. To bridge this gap, we introduce LayerAnimate, a novel video diffusion framework with layer-aware architecture that empowers the manipulation of layers through layer-level controls. The development of a layer-aware framework faces a significant data scarcity challenge due to the commercial sensitivity of professional animation assets. To address the limitation, we propose a data curation pipeline featuring Automated Element Segmentation and Motion-based Hierarchical Merging. Through quantitative and qualitative comparisons, and user study, we demonstrate that LayerAnimate outperforms current methods in terms of animation quality, control precision, and usability, making it an effective tool for both professional animators and amateur enthusiasts. This framework opens up new possibilities for layer-level animation applications and creative flexibility. Our code is available at https://layeranimate.github.io.
Paper Structure (35 sections, 3 equations, 9 figures, 3 tables)

This paper contains 35 sections, 3 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: LayerAnimate enables controllable video generation under multiple layer-level controls.
  • Figure 2: Layer Curation Pipeline. The bottom orange dashed box illustrates curated layer masks with different motion scores, where motion scores remain temporally constant throughout the animation clip. Yellow dashed boxes denote new elements absent in the first frame, demonstrating our pipeline's capability to segment dynamically appearing elements. We transparently present some frames of masklets $\bigcup_{t=0}^{F-1}\mathcal{T}_t^i$ to highlight the new elements in Key Frame $K_i$.
  • Figure 3: Overview of LayerAnimate. LayerAnimate establishes a layer-level control architecture for animation generation. It enables the flexible composition of control signals at the layer level, allowing for injecting distinct conditions (e.g., motion scores, trajectories, and sketches) for different layers. For simplicity, the text and image injection branches are omitted from the core architecture schematic.
  • Figure 4: Qualitative comparison with other competitors. We select several clips to exemplify the representative characteristics of animation, including particle effects in ①, a knife appearing off-screen ③, and an unconventional fade-in visual style in ⑥. We provide the corresponding videos in the supplementary materials, offering more clear and vivid comparisons.
  • Figure 5: Voting results of the user study. LayerAnimate exhibits superior performance across different tasks. Interp.: Interpolation. traj.: trajectory.
  • ...and 4 more figures