Table of Contents
Fetching ...

Region-Wise Correspondence Prediction between Manga Line Art Images

Yingxuan Li, Jiafeng Mao, Qianru Qiu, Yusuke Matsui

TL;DR

This work tackles predicting region-wise correspondences directly from unannotated manga line art, addressing the lack of texture cues by learning patch-level similarities with a Vision Transformer and a cross-image Multiplex Transformer. A post-processing pipeline converts patch-level affinities into coherent region-level matches through edge-aware region merging and cross-image region voting, enabling robust region correspondence without manual segmentation. An automatic annotation strategy paired with manually refined evaluation datasets supports training and benchmarking, achieving high patch-level accuracy and strong region-level alignment across real and synthetic datasets. The approach holds potential for real-world manga and animation tasks such as colorization and frame interpolation, by providing stable region-level correspondences across images.

Abstract

Understanding region-wise correspondences between manga line art images is fundamental for high-level manga processing, supporting downstream tasks such as line art colorization and in-between frame generation. Unlike natural images that contain rich visual cues, manga line art consists only of sparse black-and-white strokes, making it challenging to determine which regions correspond across images. In this work, we introduce a new task: predicting region-wise correspondence between raw manga line art images without any annotations. To address this problem, we propose a Transformer-based framework trained on large-scale, automatically generated region correspondences. The model learns to suppress noisy matches and strengthen consistent structural relationships, resulting in robust patch-level feature alignment within and across images. During inference, our method segments each line art and establishes coherent region-level correspondences through edge-aware clustering and region matching. We construct manually annotated benchmarks for evaluation, and experiments across multiple datasets demonstrate both high patch-level accuracy and strong region-level correspondence performance, achieving 78.4-84.4% region-level accuracy. These results highlight the potential of our method for real-world manga and animation applications.

Region-Wise Correspondence Prediction between Manga Line Art Images

TL;DR

This work tackles predicting region-wise correspondences directly from unannotated manga line art, addressing the lack of texture cues by learning patch-level similarities with a Vision Transformer and a cross-image Multiplex Transformer. A post-processing pipeline converts patch-level affinities into coherent region-level matches through edge-aware region merging and cross-image region voting, enabling robust region correspondence without manual segmentation. An automatic annotation strategy paired with manually refined evaluation datasets supports training and benchmarking, achieving high patch-level accuracy and strong region-level alignment across real and synthetic datasets. The approach holds potential for real-world manga and animation tasks such as colorization and frame interpolation, by providing stable region-level correspondences across images.

Abstract

Understanding region-wise correspondences between manga line art images is fundamental for high-level manga processing, supporting downstream tasks such as line art colorization and in-between frame generation. Unlike natural images that contain rich visual cues, manga line art consists only of sparse black-and-white strokes, making it challenging to determine which regions correspond across images. In this work, we introduce a new task: predicting region-wise correspondence between raw manga line art images without any annotations. To address this problem, we propose a Transformer-based framework trained on large-scale, automatically generated region correspondences. The model learns to suppress noisy matches and strengthen consistent structural relationships, resulting in robust patch-level feature alignment within and across images. During inference, our method segments each line art and establishes coherent region-level correspondences through edge-aware clustering and region matching. We construct manually annotated benchmarks for evaluation, and experiments across multiple datasets demonstrate both high patch-level accuracy and strong region-level correspondence performance, achieving 78.4-84.4% region-level accuracy. These results highlight the potential of our method for real-world manga and animation applications.

Paper Structure

This paper contains 22 sections, 5 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Given a pair of raw manga line art images, our task is to identify meaningful structural regions and match the corresponding regions across the two images. Regions shown with the same color indicate predicted correspondences.
  • Figure 2: Overview of our proposed pipeline for predicting region-wise correspondence between manga line art images. The model extracts patch-level features using a Vision Transformer and predicts patch-level similarity via a Multiplex Transformer, followed by post-processing to obtain region-level correspondence.
  • Figure 3: Framework of intra-image patch merging. From the line-art edge map and patch clusters, our edge-aware merging with watershed refinement produces a set of pixel-level, edge-aligned region groups.
  • Figure 4: An example of automatic region matching results and manually corrected ground truth. Regions with the same color across the two images indicate a matched region pair, while gray denotes regions without a match.
  • Figure 5: Patch-level Precision–Recall (PR) curves on the GenAI dataset under different training set scales.
  • ...and 3 more figures