Table of Contents
Fetching ...

MOWA: Multiple-in-One Image Warping Model

Kang Liao, Zongsheng Yue, Zhonghua Wu, Chen Change Loy

TL;DR

MOWA introduces a unified framework for six practical image warping tasks by disentangling motion estimation into region-level TPS with progressively refined control points and pixel-level residual flow, augmented by a lightweight point-based task classifier and a prompt-learning module for dynamic, task-aware warpings. This multi-task approach achieves competitive or superior results to task-specific methods while using fewer parameters, and demonstrates cross-domain and zero-shot generalization to unseen scenes and tasks. The combination of hierarchical motion modeling, efficient task discrimination, and adaptable prompts enables robust, scalable warping across diverse inputs without explicit camera-model knowledge, with extension potential to multi-view applications. Overall, MOWA offers a practical, generalizable foundation for universal image warping in computational photography and related domains.

Abstract

While recent image warping approaches achieved remarkable success on existing benchmarks, they still require training separate models for each specific task and cannot generalize well to different camera models or customized manipulations. To address diverse types of warping in practice, we propose a Multiple-in-One image WArping model (named MOWA) in this work. Specifically, we mitigate the difficulty of multi-task learning by disentangling the motion estimation at both the region level and pixel level. To further enable dynamic task-aware image warping, we introduce a lightweight point-based classifier that predicts the task type, serving as prompts to modulate the feature maps for more accurate estimation. To our knowledge, this is the first work that solves multiple practical warping tasks in one single model. Extensive experiments demonstrate that our MOWA, which is trained on six tasks for multiple-in-one image warping, outperforms state-of-the-art task-specific models across most tasks. Moreover, MOWA also exhibits promising potential to generalize into unseen scenes, as evidenced by cross-domain and zero-shot evaluations. The code and more visual results can be found on the project page: https://kangliao929.github.io/projects/mowa/.

MOWA: Multiple-in-One Image Warping Model

TL;DR

MOWA introduces a unified framework for six practical image warping tasks by disentangling motion estimation into region-level TPS with progressively refined control points and pixel-level residual flow, augmented by a lightweight point-based task classifier and a prompt-learning module for dynamic, task-aware warpings. This multi-task approach achieves competitive or superior results to task-specific methods while using fewer parameters, and demonstrates cross-domain and zero-shot generalization to unseen scenes and tasks. The combination of hierarchical motion modeling, efficient task discrimination, and adaptable prompts enables robust, scalable warping across diverse inputs without explicit camera-model knowledge, with extension potential to multi-view applications. Overall, MOWA offers a practical, generalizable foundation for universal image warping in computational photography and related domains.

Abstract

While recent image warping approaches achieved remarkable success on existing benchmarks, they still require training separate models for each specific task and cannot generalize well to different camera models or customized manipulations. To address diverse types of warping in practice, we propose a Multiple-in-One image WArping model (named MOWA) in this work. Specifically, we mitigate the difficulty of multi-task learning by disentangling the motion estimation at both the region level and pixel level. To further enable dynamic task-aware image warping, we introduce a lightweight point-based classifier that predicts the task type, serving as prompts to modulate the feature maps for more accurate estimation. To our knowledge, this is the first work that solves multiple practical warping tasks in one single model. Extensive experiments demonstrate that our MOWA, which is trained on six tasks for multiple-in-one image warping, outperforms state-of-the-art task-specific models across most tasks. Moreover, MOWA also exhibits promising potential to generalize into unseen scenes, as evidenced by cross-domain and zero-shot evaluations. The code and more visual results can be found on the project page: https://kangliao929.github.io/projects/mowa/.
Paper Structure (17 sections, 7 equations, 10 figures, 5 tables)

This paper contains 17 sections, 7 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: MOWA is devised to address a variety of practical image warping tasks within a single framework, particularly in computational photography, where six distinct types of distortions are considered in this study. It also demonstrates an ability to generalize to novel scenarios, as evidenced in both cross-domain (unfamiliar domains) and zero-shot (unseen tasks) evaluations. The approach notably identifies and uses region-level and pixel-level fields, highlighted by red boxes, to accurately warp input images.
  • Figure 2: Overview of the proposed multiple-in-one image warping model (MOWA). It begins by taking an image and a mask as input to estimate the TPS control points with progressively refined precision. During such a region-level motion estimation, feature maps are incrementally warped and rectified. These warped features are then passed to the decoder to predict residual pixel-level motion. To ensure task awareness and expandability, a lightweight point-based classifier and a prompt learning module are designed. During inference, MOWA supports image warping for any resolution by scaling the predicted TPS control points and residual flow.
  • Figure 3: Motion structures in different tasks possess their specific distribution, which potentially exists in a 2D point space. Discriminating these motion structures as a classification task can also help the image warping performance as exhibited in visual comparisons.
  • Figure 4: Qualitative comparison of our multiple-in-one framework MOWA to the SotA image warping models. The red dotted lines mark the horizon. The arrows highlight the inferior warped parts such as the irregular boundaries and distorted semantics.
  • Figure 5: Ablation study on the proposed motion estimation module. The predicted TPS control points are shown with the size of $10\times10$, $12\times12$, $14\times14$, and $16\times16$, from left to right. The coarse results and final results are obtained by warping the input using the first control points and final flow (coupled with the last TPS points and residual flow), respectively.
  • ...and 5 more figures