ChildlikeSHAPES: Semantic Hierarchical Region Parsing for Animating Figure Drawings
Astitva Srivastava, Harrison Jesse Smith, Thu Nguyen-Phuoc, Yuting Ye
TL;DR
Childlike drawings present a semantic segmentation challenge due to their abstract, region-based representations. We propose CharSegNet, a hierarchical segmentation model built on a fine-tuned Segment Anything Model, and introduce the ChildlikeSHAPES dataset with 25 semantic parts across over 16k drawings. The work enables downstream animation tasks including facial expression generation, audio-driven lip-sync, figure shading, and improved body animation, and demonstrates strong cross-domain generalization to out-of-domain drawings. Together, CharSegNet and ChildlikeSHAPES offer a practical, style-preserving foundation for accessible hand-drawn character animation and advance understanding of semantic representations in abstract art.
Abstract
Childlike human figure drawings represent one of humanity's most accessible forms of character expression, yet automatically analyzing their contents remains a significant challenge. While semantic segmentation of realistic humans has recently advanced considerably, existing models often fail when confronted with the abstract, representational nature of childlike drawings. This semantic understanding is a crucial prerequisite for animation tools that seek to modify figures while preserving their unique style. To help achieve this, we propose a novel hierarchical segmentation model, built upon the architecture and pre-trained SAM, to quickly and accurately obtain these semantic labels. Our model achieves higher accuracy than state-of-the-art segmentation models focused on realistic humans and cartoon figures, even after fine-tuning. We demonstrate the value of our model for semantic segmentation through multiple applications: a fully automatic facial animation pipeline, a figure relighting pipeline, improvements to an existing childlike human figure drawing animation method, and generalization to out-of-domain figures. Finally, to support future work in this area, we introduce a dataset of 16,000 childlike drawings with pixel-level annotations across 25 semantic categories. Our work can enable entirely new, easily accessible tools for hand-drawn character animation, and our dataset can enable new lines of inquiry in a variety of graphics and human-centric research fields.
