Table of Contents
Fetching ...

Unwarping Screen Content Images via Structure-texture Enhancement Network and Transformation Self-estimation

Zhenzhen Xiao, Heng Liu, Bingwen Hu

TL;DR

The paper tackles unwarping screen-content images with large distortions by introducing STEN, a dual-branch network that separately enhances structure and texture. The texture branch employs an implicit B-spline representation with Jacobian-based modulation, while the structure branch leverages global-local transformers to preserve geometry, and a structure-texture fusion module integrates both signals. A transformation self-estimation module trains a CNN to predict and iteratively refine an unknown transformation matrix, improving robustness to real-world distortions. Across SCI datasets and natural-image benchmarks, STEN achieves state-of-the-art or competitive results for arbitrary-scale SR and homography unwarping, demonstrating practical impact for SCI correction and downstream applications.

Abstract

While existing implicit neural network-based image unwarping methods perform well on natural images, they struggle to handle screen content images (SCIs), which often contain large geometric distortions, text, symbols, and sharp edges. To address this, we propose a structure-texture enhancement network (STEN) with transformation self-estimation for SCI warping. STEN integrates a B-spline implicit neural representation module and a transformation error estimation and self-correction algorithm. It comprises two branches: the structure estimation branch (SEB), which enhances local aggregation and global dependency modeling, and the texture estimation branch (TEB), which improves texture detail synthesis using B-spline implicit neural representation. Additionally, the transformation self-estimation module autonomously estimates the transformation error and corrects the coordinate transformation matrix, effectively handling real-world image distortions. Extensive experiments on public SCI datasets demonstrate that our approach significantly outperforms state-of-the-art methods. Comparisons on well-known natural image datasets also show the potential of our approach for natural image distortion.

Unwarping Screen Content Images via Structure-texture Enhancement Network and Transformation Self-estimation

TL;DR

The paper tackles unwarping screen-content images with large distortions by introducing STEN, a dual-branch network that separately enhances structure and texture. The texture branch employs an implicit B-spline representation with Jacobian-based modulation, while the structure branch leverages global-local transformers to preserve geometry, and a structure-texture fusion module integrates both signals. A transformation self-estimation module trains a CNN to predict and iteratively refine an unknown transformation matrix, improving robustness to real-world distortions. Across SCI datasets and natural-image benchmarks, STEN achieves state-of-the-art or competitive results for arbitrary-scale SR and homography unwarping, demonstrating practical impact for SCI correction and downstream applications.

Abstract

While existing implicit neural network-based image unwarping methods perform well on natural images, they struggle to handle screen content images (SCIs), which often contain large geometric distortions, text, symbols, and sharp edges. To address this, we propose a structure-texture enhancement network (STEN) with transformation self-estimation for SCI warping. STEN integrates a B-spline implicit neural representation module and a transformation error estimation and self-correction algorithm. It comprises two branches: the structure estimation branch (SEB), which enhances local aggregation and global dependency modeling, and the texture estimation branch (TEB), which improves texture detail synthesis using B-spline implicit neural representation. Additionally, the transformation self-estimation module autonomously estimates the transformation error and corrects the coordinate transformation matrix, effectively handling real-world image distortions. Extensive experiments on public SCI datasets demonstrate that our approach significantly outperforms state-of-the-art methods. Comparisons on well-known natural image datasets also show the potential of our approach for natural image distortion.

Paper Structure

This paper contains 20 sections, 16 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: Real-world screen content image correction employs the proposed STEN method. The image is captured by a camera, and both SRWarpson2021srwarp and LTEWlee2022learning, along with our proposed STEN, utilize the predicted transformation matrix to achieve super-resolution (SR).
  • Figure 2: The overall structure of the proposed STEN model is as follows: First, the low-resolution samples $I_{LR}$ are subjected to an inverse transformation using the inverse transformation matrix $M^{-1}$, followed by cropping to obtain $I_{LR-\text{Crop}}$ images. The entire model consists of two main branches: the Structure branch and the texture branch. The structure estimation branch allows us to obtain richer structural features, while the texture estimation branch enhances the ability of the proposed method to estimate texture information and local details more effectively, thus extracting texture features more efficiently. These features from both branches are then mapped into the Structure-Texture Fusion (STF) module for feature enhancement and subsequently decoded. By combining the Structure and texture branches, our STEN model can better handle deformation issues in screen content images.
  • Figure 3: Qualitatively compare to other arbitrary scale super-resolution (SR) methods, i.e., MetaSR hu2019meta, LIIFchen2021learning, LTElee2022local, and BTCpak2023b, at scales $\times2$ and $\times3.5$ within in scale.
  • Figure 4: Qualitatively compare to other arbitrary scale super-resolution (SR) methods, i.e., MetaSR hu2019meta, LIIFchen2021learning, LTElee2022local, and BTCpak2023b, at scales $\times2$ and $\times3.5$ within in scale. at scales $\times5$ and $\times6.4$ out of scale.
  • Figure 5: Qualitative comparison to other homography transform methods, i.e., RDNzhang2018residual, SRWarpson2021srwarp, LTEW lee2022learning within in-scale.
  • ...and 2 more figures