Enabling Region-Specific Control via Lassos in Point-Based Colorization
Sanghyeon Lee, Jooyeol Yun, Jaegul Choo
TL;DR
This work tackles color collapse in point-based interactive colorization by introducing a lasso tool that bounds color propagation and a localization attention mask to gate cross-attention within user-defined regions. The method uses a Transformer-based pipeline where grayscale queries attend to color hints through a masked cross-attention map $M_l$, ensuring colors spread only within specified lassos and their vicinity. Key contributions include simulating user hints during training, a detailed hint-encoder and localized-attention architecture, and an objective based on the Huber loss in CIE $L\*a\*b\*$ space. Empirical results show the lasso-enabled approach reduces the number of interactions and time to reach target quality, mitigates color collapse on challenging datasets, and maintains competitive PSNR/LPIPS against point-only baselines, with practical benefits for interactive color editing.
Abstract
Point-based interactive colorization techniques allow users to effortlessly colorize grayscale images using user-provided color hints. However, point-based methods often face challenges when different colors are given to semantically similar areas, leading to color intermingling and unsatisfactory results-an issue we refer to as color collapse. The fundamental cause of color collapse is the inadequacy of points for defining the boundaries for each color. To mitigate color collapse, we introduce a lasso tool that can control the scope of each color hint. Additionally, we design a framework that leverages the user-provided lassos to localize the attention masks. The experimental results show that using a single lasso is as effective as applying 4.18 individual color hints and can achieve the desired outcomes in 30% less time than using points alone.
