Table of Contents
Fetching ...

SPRITETOMESH: Automatic Mesh Generation for 2D Skeletal Animation Using Learned Segmentation and Contour-Aware Vertex Placement

Bastien Gimbert

TL;DR

SPRITETOMESH is presented, a fully automatic pipeline for converting 2D game sprite images into triangle meshes compatible with skeletal animation frameworks such as Spine2D, representing a speedup of 300x-1200x over manual creation.

Abstract

We present SPRITETOMESH, a fully automatic pipeline for converting 2D game sprite images into triangle meshes compatible with skeletal animation frameworks such as Spine2D. Creating animation-ready meshes is traditionally a tedious manual process requiring artists to carefully place vertices along visual boundaries, a task that typically takes 15-60 minutes per sprite. Our method addresses this through a hybrid learned-algorithmic approach. A segmentation network (EfficientNet-B0 encoder with U-Net decoder) trained on over 100,000 sprite-mask pairs from 172 games achieves an IoU of 0.87, providing accurate binary masks from arbitrary input images. From these masks, we extract exterior contour vertices using Douglas-Peucker simplification with adaptive arc subdivision, and interior vertices along visual boundaries detected via bilateral-filtered multi-channel Canny edge detection with contour-following placement. Delaunay triangulation with mask-based centroid filtering produces the final mesh. Through controlled experiments, we demonstrate that direct vertex position prediction via neural network heatmap regression is fundamentally not viable for this task: the heatmap decoder consistently fails to converge (loss plateau at 0.061) while the segmentation decoder trains normally under identical conditions. We attribute this to the inherently artistic nature of vertex placement - the same sprite can be meshed validly in many different ways. This negative result validates our hybrid design: learned segmentation where ground truth is unambiguous, algorithmic placement where domain heuristics are appropriate. The complete pipeline processes a sprite in under 3 seconds, representing a speedup of 300x-1200x over manual creation. We release our trained model to the game development community.

SPRITETOMESH: Automatic Mesh Generation for 2D Skeletal Animation Using Learned Segmentation and Contour-Aware Vertex Placement

TL;DR

SPRITETOMESH is presented, a fully automatic pipeline for converting 2D game sprite images into triangle meshes compatible with skeletal animation frameworks such as Spine2D, representing a speedup of 300x-1200x over manual creation.

Abstract

We present SPRITETOMESH, a fully automatic pipeline for converting 2D game sprite images into triangle meshes compatible with skeletal animation frameworks such as Spine2D. Creating animation-ready meshes is traditionally a tedious manual process requiring artists to carefully place vertices along visual boundaries, a task that typically takes 15-60 minutes per sprite. Our method addresses this through a hybrid learned-algorithmic approach. A segmentation network (EfficientNet-B0 encoder with U-Net decoder) trained on over 100,000 sprite-mask pairs from 172 games achieves an IoU of 0.87, providing accurate binary masks from arbitrary input images. From these masks, we extract exterior contour vertices using Douglas-Peucker simplification with adaptive arc subdivision, and interior vertices along visual boundaries detected via bilateral-filtered multi-channel Canny edge detection with contour-following placement. Delaunay triangulation with mask-based centroid filtering produces the final mesh. Through controlled experiments, we demonstrate that direct vertex position prediction via neural network heatmap regression is fundamentally not viable for this task: the heatmap decoder consistently fails to converge (loss plateau at 0.061) while the segmentation decoder trains normally under identical conditions. We attribute this to the inherently artistic nature of vertex placement - the same sprite can be meshed validly in many different ways. This negative result validates our hybrid design: learned segmentation where ground truth is unambiguous, algorithmic placement where domain heuristics are appropriate. The complete pipeline processes a sprite in under 3 seconds, representing a speedup of 300x-1200x over manual creation. We release our trained model to the game development community.
Paper Structure (49 sections, 5 equations, 18 figures, 5 tables)

This paper contains 49 sections, 5 equations, 18 figures, 5 tables.

Figures (18)

  • Figure 1: Overview of the SpriteToMesh pipeline in six panels. (1) Input sprite image. (2) Binary segmentation mask obtained from the alpha channel or our EfficientNet-B0 network. (3) Exterior contour vertices (blue) placed via Douglas-Peucker simplification with adaptive arc subdivision. (4) Interior boundary vertices (yellow) detected via multi-channel Canny edge detection, combined with exterior vertices (blue). (5) Delaunay triangulation wireframe with centroid-based filtering. (6) Final mesh overlaid on the original sprite.
  • Figure 2: Exterior contour extraction process in four panels. (a) Binary segmentation mask. (b) Raw contour extracted with findContours, showing thousands of contour pixels. (c) Douglas-Peucker simplification ($\epsilon = 0.003 \times P$) identifies structural keypoints at corners and concavities (red dots). (d) Final hull vertices (red dots with white outlines) after adaptive arc subdivision adds uniformly-spaced vertices between keypoints along the actual contour path.
  • Figure 3: Interior vertex placement pipeline shown as a $2 \times 3$ grid. (a) Original sprite. (b) Bilateral-filtered image (texture noise reduced, structural edges preserved). (c) Multi-channel Canny edge map (union of per-channel detections). (d) Masked edges after erosion and morphological closing. (e) Interior vertices (yellow) placed along surviving contours using Douglas-Peucker simplification and uniform subdivision. (f) All vertices combined: exterior (blue) and interior (yellow), ready for Delaunay triangulation.
  • Figure 4: Example of binary-to-JSON conversion using SkelToJson. The binary .skel format encodes the complete skeleton as a dense byte stream (left). Our converter produces structured JSON (right) from which mesh vertex positions, UV coordinates, and triangle indices can be directly extracted.
  • Figure 5: Dataset preparation pipeline shown in four panels. (1) A full texture atlas from a game. (2) An individual sprite attachment cropped from the atlas using region definitions parsed from the .atlas file. (3) The binary mask extracted from the sprite's alpha channel ($\alpha > 128$). (4) The sprite composited onto a random gradient background, as used during training to force the network to learn foreground--background separation from visual features rather than trivial alpha thresholding.
  • ...and 13 more figures