Table of Contents
Fetching ...

Rectification Reimagined: A Unified Mamba Model for Image Correction and Rectangling with Prompts

Linwei Qiu, Gongzhe Li, Xiaozhe Zhang, Qinlin Sun, Fengying Xie

TL;DR

<3-5 sentence high-level summary> UniRect reframes image correction and rectangling as a single distortion-rectification problem, introducing a general distortion model that unifies portrait, wide-angle, stitched, and rotation distortions. It provides a two-module architecture: a Deformation Module based on Residual Progressive Thin-Plate Spline (RP-TPS) and a Restoration Module built with Residual Mamba Blocks (RMBs), augmented by a Sparse Mixture-of-Experts (SMoEs) to enable four tasks in one model. The framework uses task-guiding prompts and specialized losses to constrain deformation and restoration, achieving state-of-the-art performance across four tasks and demonstrating strong cross-task generalization and real-world applicability. While computationally intensive, UniRect offers a scalable path toward unified, edge-friendly rectification pipelines on mobile devices.</paper_summary>

Abstract

Image correction and rectangling are valuable tasks in practical photography systems such as smartphones. Recent remarkable advancements in deep learning have undeniably brought about substantial performance improvements in these fields. Nevertheless, existing methods mainly rely on task-specific architectures. This significantly restricts their generalization ability and effective application across a wide range of different tasks. In this paper, we introduce the Unified Rectification Framework (UniRect), a comprehensive approach that addresses these practical tasks from a consistent distortion rectification perspective. Our approach incorporates various task-specific inverse problems into a general distortion model by simulating different types of lenses. To handle diverse distortions, UniRect adopts one task-agnostic rectification framework with a dual-component structure: a {Deformation Module}, which utilizes a novel Residual Progressive Thin-Plate Spline (RP-TPS) model to address complex geometric deformations, and a subsequent Restoration Module, which employs Residual Mamba Blocks (RMBs) to counteract the degradation caused by the deformation process and enhance the fidelity of the output image. Moreover, a Sparse Mixture-of-Experts (SMoEs) structure is designed to circumvent heavy task competition in multi-task learning due to varying distortions. Extensive experiments demonstrate that our models have achieved state-of-the-art performance compared with other up-to-date methods.

Rectification Reimagined: A Unified Mamba Model for Image Correction and Rectangling with Prompts

TL;DR

<3-5 sentence high-level summary> UniRect reframes image correction and rectangling as a single distortion-rectification problem, introducing a general distortion model that unifies portrait, wide-angle, stitched, and rotation distortions. It provides a two-module architecture: a Deformation Module based on Residual Progressive Thin-Plate Spline (RP-TPS) and a Restoration Module built with Residual Mamba Blocks (RMBs), augmented by a Sparse Mixture-of-Experts (SMoEs) to enable four tasks in one model. The framework uses task-guiding prompts and specialized losses to constrain deformation and restoration, achieving state-of-the-art performance across four tasks and demonstrating strong cross-task generalization and real-world applicability. While computationally intensive, UniRect offers a scalable path toward unified, edge-friendly rectification pipelines on mobile devices.</paper_summary>

Abstract

Image correction and rectangling are valuable tasks in practical photography systems such as smartphones. Recent remarkable advancements in deep learning have undeniably brought about substantial performance improvements in these fields. Nevertheless, existing methods mainly rely on task-specific architectures. This significantly restricts their generalization ability and effective application across a wide range of different tasks. In this paper, we introduce the Unified Rectification Framework (UniRect), a comprehensive approach that addresses these practical tasks from a consistent distortion rectification perspective. Our approach incorporates various task-specific inverse problems into a general distortion model by simulating different types of lenses. To handle diverse distortions, UniRect adopts one task-agnostic rectification framework with a dual-component structure: a {Deformation Module}, which utilizes a novel Residual Progressive Thin-Plate Spline (RP-TPS) model to address complex geometric deformations, and a subsequent Restoration Module, which employs Residual Mamba Blocks (RMBs) to counteract the degradation caused by the deformation process and enhance the fidelity of the output image. Moreover, a Sparse Mixture-of-Experts (SMoEs) structure is designed to circumvent heavy task competition in multi-task learning due to varying distortions. Extensive experiments demonstrate that our models have achieved state-of-the-art performance compared with other up-to-date methods.

Paper Structure

This paper contains 36 sections, 16 equations, 16 figures, 10 tables.

Figures (16)

  • Figure 1: (a) Pragmatic tasks for a smartphone. We consider four detailed tasks for image correction and rectangling, which are closely related to two types of common cameras on a mainstream mobile phone. Portrait correction ($\mathcal{T}_1$) and rectified wide-angle image rectangling ($\mathcal{T}_2$) often need to process pictures taken with a wide-angle lens. Stitched image rectangling ($\mathcal{T}_3$) and rotation correction ($\mathcal{T}_4$) are two practical tasks for daily life. (b) Camera distortion models and flow visualization for different tasks. On a normalized plane, the image of the point $P$ is $p_d$ whereas it would be $p$ without distortion following a pinhole camera model kannala2006generic. The optical flows of backward distortions are generated through the RAFT teed2020raft, which is the one of the most powerful tools for optical flow estimation. (c) Multi-task types. (1) four-to-four. Previous studies often use four disparate models to accomplish four tasks separately. (2) four-by-one. four tasks are achieved by one model structure but sharing different even counteractive network weights. (3) four-in-one. four tasks are employed in one model which can handle diverse tasks at the same time. four-by-one and four-in-one are all task-agnostic models. We obtain the four-by-one and four-in-one in this paper.
  • Figure 2: (a) Framework of our Unified Rectification. It mainly consists of a (b) deformation module (DM) and a (d) restoration module (RM), which are trained simultaneously. An image $X^i_0$ from the image set of task $\mathcal{T}_i$ and a corresponding visual prompt $M^i_0$ indicating which task to perform are set as input for UniRect, which yields the final result $X^i_R$. In deformation module (ignoring our residual progressive setting for simplicity), the (c) control point predictor $\mathcal{C}$ can predict the locations of a set of control points, i.e. $c$ , with which the grid generator $\mathcal{G}$ produces a sampling grid $\mathcal{P}$ by \ref{['eq:5']}. The sampler $\mathcal{S}$ then samples from $X^i_0$ and $M^i_0$ with the restriction of $\mathcal{P}$, resulting in the rectified image $X^i_D$ and its new prompt $M^i_D$. For some tasks without boundary changes like $\mathcal{T}_1$ and $\mathcal{T}_4$, $M^i_D$ will be changed into the all-one matrix. (e) Mamba block is applied in the control point predictor to obtain geometric information like borders by its scan characteristic gu2021combininggu2023mamba. RM is composed of residual mamba blocks (RMBs), which effectively captures global connections for restoration. (f) Rectification for all tasks. Our model treats all tasks as rectification tasks and subsequently incorporates them into a unified framework.
  • Figure 3: Qualitative comparison for our UniRect on four tasks. Zoom in for best view.
  • Figure 4: Cross-Task Degradation. Models trained on four tasks (different weights $w_i$) are evaluated for one task $\mathcal{T}_j$. For $\mathcal{T}_1$, we use LineACC to quantify the performance, and FID is for remaining tasks.
  • Figure 5: Results of the same input under $\mathcal{T}_3$ and $\mathcal{T}_4$ prompts.
  • ...and 11 more figures