Region-Wise Correspondence Prediction between Manga Line Art Images
Yingxuan Li, Jiafeng Mao, Qianru Qiu, Yusuke Matsui
TL;DR
This work tackles predicting region-wise correspondences directly from unannotated manga line art, addressing the lack of texture cues by learning patch-level similarities with a Vision Transformer and a cross-image Multiplex Transformer. A post-processing pipeline converts patch-level affinities into coherent region-level matches through edge-aware region merging and cross-image region voting, enabling robust region correspondence without manual segmentation. An automatic annotation strategy paired with manually refined evaluation datasets supports training and benchmarking, achieving high patch-level accuracy and strong region-level alignment across real and synthetic datasets. The approach holds potential for real-world manga and animation tasks such as colorization and frame interpolation, by providing stable region-level correspondences across images.
Abstract
Understanding region-wise correspondences between manga line art images is fundamental for high-level manga processing, supporting downstream tasks such as line art colorization and in-between frame generation. Unlike natural images that contain rich visual cues, manga line art consists only of sparse black-and-white strokes, making it challenging to determine which regions correspond across images. In this work, we introduce a new task: predicting region-wise correspondence between raw manga line art images without any annotations. To address this problem, we propose a Transformer-based framework trained on large-scale, automatically generated region correspondences. The model learns to suppress noisy matches and strengthen consistent structural relationships, resulting in robust patch-level feature alignment within and across images. During inference, our method segments each line art and establishes coherent region-level correspondences through edge-aware clustering and region matching. We construct manually annotated benchmarks for evaluation, and experiments across multiple datasets demonstrate both high patch-level accuracy and strong region-level correspondence performance, achieving 78.4-84.4% region-level accuracy. These results highlight the potential of our method for real-world manga and animation applications.
