Object-IR: Leveraging Object Consistency and Mesh Deformation for Self-Supervised Image Retargeting
Tianli Liao, Ran Wang, Siqing Zhang, Lei Li, Guangen Liu, Chenyang Zhao, Heling Cao, Peng Li
TL;DR
Object-IR reframes image retargeting as a self-supervised mesh warping problem, predicting a deformed mesh from a predefined output-mesh to minimize distortion in semantically important regions. It employs a three-term loss: object loss for appearance preservation, geometric loss enforcing scale-like consistency within objects, and boundary loss ensuring rectangular outputs, enabling training without labeled retargeting data. Across RetargetMe and COCO-derived benchmarks, Object-IR achieves state-of-the-art distortion metrics and perceptual quality, with real-time inference on consumer GPUs (≈0.009s for 1024x683). The approach also includes a retargeting-quality metric aligned with human judgments and extensive ablations, demonstrating robustness across arbitrary aspect-ratio changes and highlighting avenues for architectural and evaluation advances.
Abstract
Eliminating geometric distortion in semantically important regions remains an intractable challenge in image retargeting. This paper presents Object-IR, a self-supervised architecture that reformulates image retargeting as a learning-based mesh warping optimization problem, where the mesh deformation is guided by object appearance consistency and geometric-preserving constraints. Given an input image and a target aspect ratio, we initialize a uniform rigid mesh at the output resolution and use a convolutional neural network to predict the motion of each mesh grid and obtain the deformed mesh. The retargeted result is generated by warping the input image according to the rigid mesh in the input image and the deformed mesh in the output resolution. To mitigate geometric distortion, we design a comprehensive objective function incorporating a) object-consistent loss to ensure that the important semantic objects retain their appearance, b) geometric-preserving loss to constrain simple scale transform of the important meshes, and c) boundary loss to enforce a clean rectangular output. Notably, our self-supervised paradigm eliminates the need for manually annotated retargeting datasets by deriving supervision directly from the input's geometric and semantic properties. Extensive evaluations on the RetargetMe benchmark demonstrate that our Object-IR achieves state-of-the-art performance, outperforming existing methods in quantitative metrics and subjective visual quality assessments. The framework efficiently processes arbitrary input resolutions (average inference time: 0.009s for 1024x683 resolution) while maintaining real-time performance on consumer-grade GPUs. The source code will soon be available at https://github.com/tlliao/Object-IR.
