Table of Contents
Fetching ...

Collaborative Neural Rendering using Anime Character Sheets

Zuzeng Lin, Ailin Huang, Zhewei Huang

TL;DR

CoNR tackles the challenge of generating 2D anime character images in user-specified poses from character sheets by introducing Ultra-Dense Pose (UDP), a compact pose representation that bypasses UV texture mappings. The method employs a collaborative CINN-based renderer with multi-view feature fusion and an optional UDP Detector to fuse information across multiple reference images, achieving pose-consistent renderings for both hand-drawn and synthesized data. Key contributions include formulating a new rendering task from character sheets, proposing UDP for detailed pose control, and releasing a large open dataset to support research in this area. The approach offers practical benefits for anime production, enabling artists to rapidly generate pose-conditioned previews and animations with improved control and consistency.

Abstract

Drawing images of characters with desired poses is an essential but laborious task in anime production. Assisting artists to create is a research hotspot in recent years. In this paper, we present the Collaborative Neural Rendering (CoNR) method, which creates new images for specified poses from a few reference images (AKA Character Sheets). In general, the diverse hairstyles and garments of anime characters defies the employment of universal body models like SMPL, which fits in most nude human shapes. To overcome this, CoNR uses a compact and easy-to-obtain landmark encoding to avoid creating a unified UV mapping in the pipeline. In addition, the performance of CoNR can be significantly improved when referring to multiple reference images, thanks to feature space cross-view warping in a carefully designed neural network. Moreover, we have collected a character sheet dataset containing over 700,000 hand-drawn and synthesized images of diverse poses to facilitate research in this area. Our code and demo are available at https://github.com/megvii-research/IJCAI2023-CoNR.

Collaborative Neural Rendering using Anime Character Sheets

TL;DR

CoNR tackles the challenge of generating 2D anime character images in user-specified poses from character sheets by introducing Ultra-Dense Pose (UDP), a compact pose representation that bypasses UV texture mappings. The method employs a collaborative CINN-based renderer with multi-view feature fusion and an optional UDP Detector to fuse information across multiple reference images, achieving pose-consistent renderings for both hand-drawn and synthesized data. Key contributions include formulating a new rendering task from character sheets, proposing UDP for detailed pose control, and releasing a large open dataset to support research in this area. The approach offers practical benefits for anime production, enabling artists to rapidly generate pose-conditioned previews and animations with improved control and consistency.

Abstract

Drawing images of characters with desired poses is an essential but laborious task in anime production. Assisting artists to create is a research hotspot in recent years. In this paper, we present the Collaborative Neural Rendering (CoNR) method, which creates new images for specified poses from a few reference images (AKA Character Sheets). In general, the diverse hairstyles and garments of anime characters defies the employment of universal body models like SMPL, which fits in most nude human shapes. To overcome this, CoNR uses a compact and easy-to-obtain landmark encoding to avoid creating a unified UV mapping in the pipeline. In addition, the performance of CoNR can be significantly improved when referring to multiple reference images, thanks to feature space cross-view warping in a carefully designed neural network. Moreover, we have collected a character sheet dataset containing over 700,000 hand-drawn and synthesized images of diverse poses to facilitate research in this area. Our code and demo are available at https://github.com/megvii-research/IJCAI2023-CoNR.
Paper Structure (23 sections, 7 equations, 8 figures, 5 tables)

This paper contains 23 sections, 7 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: (a) The results of CoNR. Based on the desired poses $\{ \textbf{P}_{tar}\}$ and the character sheet $S_{ref}$, CoNR renders new anime images $\hat{\mathbf{I}}_{tar}$. (b) The generation process of UDP. We use the $XYZ$ coordinates of a point on the surface of a 3D model as the $RGB$ value of the point and then color a 3D model. Then, we take a 2D view of the 3D model as UDP. (c) Inference pipeline of CoNR. Reference images $I_1 \cdots I_n\in \mathbf{S}_{ref}$ from the input character sheet are fed into CoNR using modified U-Nets ronneberger_u-net_2015 as sub-networks. UDP $\mathbf{P}_{tar}$ is resized and concatenated into each scale of the encoder outputs in all sub-networks. Blocks with the same color share weights. "D1 to D4" refers to four blocks of the decoder. Each block will receive the averaged message from corresponding blocks in all other sub-networks.
  • Figure 2: Random characters with random backgrounds.
  • Figure 3: First row: Inference results on validation dataset. Second row: Inference results with the same character sheet input ${S}_{ref}$ on different body structure ${P}_{tar}$.
  • Figure 4: Transfer appearance based on the character sheet. This UDP comes from UDP Detector rather than 3D softwares.
  • Figure 5: Effectiveness of the collaboration. We perform a reconstruction experiment with Chika Dance, which is a high-quality rotoscoping animation (in which body and clothing motions are drawn according to real characters) ensuring that the ground truth is reasonable. The last row shows $8$ ground truth frames ${I}^{GT}_{i}\in {S}^{GT}_{vid}$ from a video. In this experiment, the UDPs are estimated from ground truth frames by a trained UDP Detector. The first two rows show the input and output images of CoNR. The used subsets of character sheet ${S}_{ref} \subset {S}^{GT}_{vid}$ are marked using the blue background. Generated images for novel poses are marked using the red backgrounds.
  • ...and 3 more figures