Unwarping Screen Content Images via Structure-texture Enhancement Network and Transformation Self-estimation
Zhenzhen Xiao, Heng Liu, Bingwen Hu
TL;DR
The paper tackles unwarping screen-content images with large distortions by introducing STEN, a dual-branch network that separately enhances structure and texture. The texture branch employs an implicit B-spline representation with Jacobian-based modulation, while the structure branch leverages global-local transformers to preserve geometry, and a structure-texture fusion module integrates both signals. A transformation self-estimation module trains a CNN to predict and iteratively refine an unknown transformation matrix, improving robustness to real-world distortions. Across SCI datasets and natural-image benchmarks, STEN achieves state-of-the-art or competitive results for arbitrary-scale SR and homography unwarping, demonstrating practical impact for SCI correction and downstream applications.
Abstract
While existing implicit neural network-based image unwarping methods perform well on natural images, they struggle to handle screen content images (SCIs), which often contain large geometric distortions, text, symbols, and sharp edges. To address this, we propose a structure-texture enhancement network (STEN) with transformation self-estimation for SCI warping. STEN integrates a B-spline implicit neural representation module and a transformation error estimation and self-correction algorithm. It comprises two branches: the structure estimation branch (SEB), which enhances local aggregation and global dependency modeling, and the texture estimation branch (TEB), which improves texture detail synthesis using B-spline implicit neural representation. Additionally, the transformation self-estimation module autonomously estimates the transformation error and corrects the coordinate transformation matrix, effectively handling real-world image distortions. Extensive experiments on public SCI datasets demonstrate that our approach significantly outperforms state-of-the-art methods. Comparisons on well-known natural image datasets also show the potential of our approach for natural image distortion.
