MOWA: Multiple-in-One Image Warping Model
Kang Liao, Zongsheng Yue, Zhonghua Wu, Chen Change Loy
TL;DR
MOWA introduces a unified framework for six practical image warping tasks by disentangling motion estimation into region-level TPS with progressively refined control points and pixel-level residual flow, augmented by a lightweight point-based task classifier and a prompt-learning module for dynamic, task-aware warpings. This multi-task approach achieves competitive or superior results to task-specific methods while using fewer parameters, and demonstrates cross-domain and zero-shot generalization to unseen scenes and tasks. The combination of hierarchical motion modeling, efficient task discrimination, and adaptable prompts enables robust, scalable warping across diverse inputs without explicit camera-model knowledge, with extension potential to multi-view applications. Overall, MOWA offers a practical, generalizable foundation for universal image warping in computational photography and related domains.
Abstract
While recent image warping approaches achieved remarkable success on existing benchmarks, they still require training separate models for each specific task and cannot generalize well to different camera models or customized manipulations. To address diverse types of warping in practice, we propose a Multiple-in-One image WArping model (named MOWA) in this work. Specifically, we mitigate the difficulty of multi-task learning by disentangling the motion estimation at both the region level and pixel level. To further enable dynamic task-aware image warping, we introduce a lightweight point-based classifier that predicts the task type, serving as prompts to modulate the feature maps for more accurate estimation. To our knowledge, this is the first work that solves multiple practical warping tasks in one single model. Extensive experiments demonstrate that our MOWA, which is trained on six tasks for multiple-in-one image warping, outperforms state-of-the-art task-specific models across most tasks. Moreover, MOWA also exhibits promising potential to generalize into unseen scenes, as evidenced by cross-domain and zero-shot evaluations. The code and more visual results can be found on the project page: https://kangliao929.github.io/projects/mowa/.
