HALO: Human-Aligned End-to-end Image Retargeting with Layered Transformations
Yiran Xu, Siqi Xie, Zhuofang Li, Harris Shadmany, Yinxiao Li, Luciano Sbaiz, Miaosen Wang, Junjie Ke, Jose Lezama, Hang Qi, Han Zhang, Jesse Berent, Ming-Hsuan Yang, Irfan Essa, Jia-Bin Huang, Feng Yang
TL;DR
HALO tackles the challenge of image retargeting by introducing layered transformations that treat salient and non-salient regions separately, mitigating artifacts and preserving content. The method employs a Multi-Flow Network with cross-attention between the original and target-size images to predict two warp fields, which are composited to form the output along with a warped saliency map. A key contribution is the Perceptual Structure Similarity Loss (PSSL), which uses DreamSim on a layout-augmented pseudo-ground-truth to supervise structure preservation without paired data. Empirical results on RetargetMe and extensive ablations show HALO achieving state-of-the-art content and structure preservation with strong user preferences, while offering faster inference due to end-to-end training.
Abstract
Image retargeting aims to change the aspect-ratio of an image while maintaining its content and structure with less visual artifacts. Existing methods still generate many artifacts or fail to maintain original content or structure. To address this, we introduce HALO, an end-to-end trainable solution for image retargeting. Since humans are more sensitive to distortions in salient areas than non-salient areas of an image, HALO decomposes the input image into salient/non-salient layers and applies different wrapping fields to different layers. To further minimize the structure distortion in the output images, we propose perceptual structure similarity loss which measures the structure similarity between input and output images and aligns with human perception. Both quantitative results and a user study on the RetargetMe dataset show that HALO achieves SOTA. Especially, our method achieves an 18.4% higher user preference compared to the baselines on average.
