Table of Contents
Fetching ...

TGIF2: Extended Text-Guided Inpainting Forgery Dataset & Benchmark

Hannes Mareen, Dimitrios Karageorgiou, Paschalis Giakoumoglou, Peter Lambert, Symeon Papadopoulos, Glenn Van Wallendael

Abstract

Generative AI has made text-guided inpainting a powerful image editing tool, but at the same time a growing challenge for media forensics. Existing benchmarks, including our text-guided inpainting forgery (TGIF) dataset, show that image forgery localization (IFL) methods can localize manipulations in spliced images but struggle not in fully regenerated (FR) images, while synthetic image detection (SID) methods can detect fully regenerated images but cannot perform localization. With new generative inpainting models emerging and the open problem of localization in FR images remaining, updated datasets and benchmarks are needed. We introduce TGIF2, an extended version of TGIF, that captures recent advances in text-guided inpainting and enables a deeper analysis of forensic robustness. TGIF2 augments the original dataset with edits generated by FLUX.1 models, as well as with random non-semantic masks. Using the TGIF2 dataset, we conduct a forensic evaluation spanning IFL and SID, including fine-tuning IFL methods on FR images and generative super-resolution attacks. Our experiments show that both IFL and SID methods degrade on FLUX.1 manipulations, highlighting limited generalization. Additionally, while fine-tuning improves localization on FR images, evaluation with random non-semantic masks reveals object bias. Furthermore, generative super-resolution significantly weakens forensic traces, demonstrating that common image enhancement operations can undermine current forensic pipelines. In summary, TGIF2 provides an updated dataset and benchmark, which enables new insights into the challenges posed by modern inpainting and AI-based image enhancements. TGIF2 is available at https://github.com/IDLabMedia/tgif-dataset.

TGIF2: Extended Text-Guided Inpainting Forgery Dataset & Benchmark

Abstract

Generative AI has made text-guided inpainting a powerful image editing tool, but at the same time a growing challenge for media forensics. Existing benchmarks, including our text-guided inpainting forgery (TGIF) dataset, show that image forgery localization (IFL) methods can localize manipulations in spliced images but struggle not in fully regenerated (FR) images, while synthetic image detection (SID) methods can detect fully regenerated images but cannot perform localization. With new generative inpainting models emerging and the open problem of localization in FR images remaining, updated datasets and benchmarks are needed. We introduce TGIF2, an extended version of TGIF, that captures recent advances in text-guided inpainting and enables a deeper analysis of forensic robustness. TGIF2 augments the original dataset with edits generated by FLUX.1 models, as well as with random non-semantic masks. Using the TGIF2 dataset, we conduct a forensic evaluation spanning IFL and SID, including fine-tuning IFL methods on FR images and generative super-resolution attacks. Our experiments show that both IFL and SID methods degrade on FLUX.1 manipulations, highlighting limited generalization. Additionally, while fine-tuning improves localization on FR images, evaluation with random non-semantic masks reveals object bias. Furthermore, generative super-resolution significantly weakens forensic traces, demonstrating that common image enhancement operations can undermine current forensic pipelines. In summary, TGIF2 provides an updated dataset and benchmark, which enables new insights into the challenges posed by modern inpainting and AI-based image enhancements. TGIF2 is available at https://github.com/IDLabMedia/tgif-dataset.

Paper Structure

This paper contains 24 sections, 4 figures, 13 tables.

Figures (4)

  • Figure 1: Examples of the two ways of inpainting. An (a) authentic image can be inpainted in a (b) selected region as mask, with skis as prompt. In the GenAI-based inpainting process (here using SDXL), (c) the full image is regenerated during editing. To minimize artifacts, (d) only the region corresponding to the mask can be spliced into the authentic image. (e) and (f) provide zoomed-in versions to more clearly show the differences between the original or spliced pixels, on the one hand, and the regenerated pixels outside of the masked area, on the other. Subtle differences due to the regenerative process can be observed in the teeth, sunglasses, beard, etc.
  • Figure 2: Side-by-side comparison of (a) a real image of a cat with (b) the mask used for inpainting, and (c)-(h) 6 inpainted versions using 6 different inpainting models used in TGIF2, respectively. Note that other masks can be used as well (see Fig. \ref{['fig:tgif-workflow-masks']}).
  • Figure 3: The (a)-(c) three types of masks used and (d)-(f) three corresponding inpainted examples, respectively. All inpainted images used Fig. \ref{['fig:tgif-models-orig']} as input image.
  • Figure 4: An example of a fully regenerated inpainted image with SD2, using a bounding box of the tennis racket as mask (a, SD2 Sem), or a random mask in the image (d, SD2 Rand), along with the corresponding ground-truth masks (b, e) and the fine-tuned TruFor detection results (c, f). The fine-tuned TruFor was trained on the SD2-FR-Sem training set. The inpainted tennis racket in the SD2-FR-Sem test image is detected relatively well by the fine-tuned model. However, the inpainted random box of the SD2-FR-Rand subset is not detected; instead, the tennis racket is wrongly detected -- suggesting that the fine-tuned model is be biased towards semantics or salient objects. Note that the inpainted random box, in fact, made notable changes to the sign in the background, and hence should be detectable by a better detection model.