Table of Contents
Fetching ...

Inference-time Trajectory Optimization for Manga Image Editing

Ryosuke Furuta

Abstract

We present an inference-time adaptation method that tailors a pretrained image editing model to each input manga image using only the input image itself. Despite recent progress in pretrained image editing, such models often underperform on manga because they are trained predominantly on natural-image data. Re-training or fine-tuning large-scale models on manga is, however, generally impractical due to both computational cost and copyright constraints. To address this issue, our method slightly corrects the generation trajectory at inference time so that the input image can be reconstructed more faithfully under an empty prompt. Experimental results show that our method consistently outperforms existing baselines while incurring only negligible computational overhead.

Inference-time Trajectory Optimization for Manga Image Editing

Abstract

We present an inference-time adaptation method that tailors a pretrained image editing model to each input manga image using only the input image itself. Despite recent progress in pretrained image editing, such models often underperform on manga because they are trained predominantly on natural-image data. Re-training or fine-tuning large-scale models on manga is, however, generally impractical due to both computational cost and copyright constraints. To address this issue, our method slightly corrects the generation trajectory at inference time so that the input image can be reconstructed more faithfully under an empty prompt. Experimental results show that our method consistently outperforms existing baselines while incurring only negligible computational overhead.

Paper Structure

This paper contains 27 sections, 37 equations, 5 figures, 8 tables, 1 algorithm.

Figures (5)

  • Figure 1: Editing results with an empty prompt. © Kato Masaki
  • Figure 2: Examples of manga image editing. In each pair, the left image is the input and the right image is the ground truth. © Kato Masaki © Yabuno Tenya, Watanabe Tatsuya
  • Figure 3: Qualitative comparison for text removal. © Nagano Noriko © Omi Ayuko
  • Figure 4: Effect of the prompt used for trajectory correction. © Okuda Momoko
  • Figure 5: Qualitative comparison for screentone synthesis. © Saki Kaori © Kurita Riku