Toward Real-world Text Image Forgery Localization: Structured and Interpretable Data Synthesis

Zeqin Yu; Haotao Xie; Jian Zhang; Jiangqun Ni; Wenkan Su; Jiwu Huang

Toward Real-world Text Image Forgery Localization: Structured and Interpretable Data Synthesis

Zeqin Yu, Haotao Xie, Jian Zhang, Jiangqun Ni, Wenkan Su, Jiwu Huang

TL;DR

The paper addresses the poor real-world generalization of text image forgery localization (T-IFL) models by modeling the invisible, high-dimensional tampering parameters that underlie real-world forgeries. It introduces Fourier Series-based Tampering Synthesis (FSTS), a hierarchical, interpretable framework that collects 16,750 real-world tampering traces from 67 experts, identifies recurring individual and population patterns, and represents tampering distributions as basis configurations with learned weights. By sampling these coefficients and configurations, FSTS synthesizes diverse and realistic tampered images that better reflect real forgery traces, improving cross-domain generalization on real-world datasets. Extensive experiments across four evaluation protocols show that FSTS-trained models consistently outperform baselines trained on conventional synthetic data, highlighting the practical impact of incorporating real-world tampering distributions into synthetic data generation. This approach offers a principled path toward robust, scalable T-IFL systems capable of handling unseen tampering scenarios.

Abstract

Existing Text Image Forgery Localization (T-IFL) methods often suffer from poor generalization due to the limited scale of real-world datasets and the distribution gap caused by synthetic data that fails to capture the complexity of real-world tampering. To tackle this issue, we propose Fourier Series-based Tampering Synthesis (FSTS), a structured and interpretable framework for synthesizing tampered text images. FSTS first collects 16,750 real-world tampering instances from five representative tampering types, using a structured pipeline that records human-performed editing traces via multi-format logs (e.g., video, PSD, and editing logs). By analyzing these collected parameters and identifying recurring behavioral patterns at both individual and population levels, we formulate a hierarchical modeling framework. Specifically, each individual tampering parameter is represented as a compact combination of basis operation-parameter configurations, while the population-level distribution is constructed by aggregating these behaviors. Since this formulation draws inspiration from the Fourier series, it enables an interpretable approximation using basis functions and their learned weights. By sampling from this modeled distribution, FSTS synthesizes diverse and realistic training data that better reflect real-world forgery traces. Extensive experiments across four evaluation protocols demonstrate that models trained with FSTS data achieve significantly improved generalization on real-world datasets. Dataset is available at \href{https://github.com/ZeqinYu/FSTS}{Project Page}.

Toward Real-world Text Image Forgery Localization: Structured and Interpretable Data Synthesis

TL;DR

Abstract

Toward Real-world Text Image Forgery Localization: Structured and Interpretable Data Synthesis

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)