Table of Contents
Fetching ...

Rethinking Image Forgery Detection via Soft Contrastive Learning and Unsupervised Clustering

Haiwei Wu, Yiming Chen, Jiantao Zhou, Yuanman Li

TL;DR

This work tackles the core challenge of image forgery detection by addressing the relative definition of forged and pristine regions within an image, which standard pixel-level classification can misinterpret across images. It introduces FOCAL, a framework that trains with soft contrastive learning (SCL) per image and detects via on-the-fly unsupervised clustering, avoiding cross-image label conflicts and trainable clustering components. The SCL uses an optimizable weight matrix $W$ and category centers $M_0,M_1$ to form an image-level soft contrastive loss $\,\mathcal{L}_{\text{SCL}}$, while testing relies on HDBSCAN clustering to assign forged/pristine labels without retraining; feature-level fusion further boosts performance. Empirically, FOCAL achieves large IoU gains across six cross-domain datasets, shows strong AUC on pixel- and image-level metrics, and demonstrates robustness to common post-processing and OSN transmissions, making it a competitive new benchmark for forgery detection.

Abstract

Image forgery detection aims to detect and locate forged regions in an image. Most existing forgery detection algorithms formulate classification problems to classify pixels into forged or pristine. However, the definition of forged and pristine pixels is only relative within one single image, e.g., a forged region in image A is actually a pristine one in its source image B (splicing forgery). Such a relative definition has been severely overlooked by existing methods, which unnecessarily mix forged (pristine) regions across different images into the same category. To resolve this dilemma, we propose the FOrensic ContrAstive cLustering (FOCAL) method, a novel, simple yet very effective paradigm based on soft contrastive learning and unsupervised clustering for the image forgery detection. Specifically, FOCAL 1) designs a soft contrastive learning (SCL) to supervise the high-level forensic feature extraction in an image-by-image manner, explicitly reflecting the above relative definition; 2) employs an on-the-fly unsupervised clustering algorithm (instead of a trained one) to cluster the learned features into forged/pristine categories, further suppressing the cross-image influence from training data; and 3) allows to further boost the detection performance via simple feature-level concatenation without the need of retraining. Extensive experimental results over six public testing datasets demonstrate that our proposed FOCAL significantly outperforms the state-of-the-art competitors by big margins: +24.8% on Coverage, +18.9% on Columbia, +17.3% on FF++, +15.3% on MISD, +15.0% on CASIA and +10.5% on NIST in terms of IoU (see also Fig. 1). The paradigm of FOCAL could bring fresh insights and serve as a novel benchmark for the image forgery detection task. The code is available at https://github.com/HighwayWu/FOCAL.

Rethinking Image Forgery Detection via Soft Contrastive Learning and Unsupervised Clustering

TL;DR

This work tackles the core challenge of image forgery detection by addressing the relative definition of forged and pristine regions within an image, which standard pixel-level classification can misinterpret across images. It introduces FOCAL, a framework that trains with soft contrastive learning (SCL) per image and detects via on-the-fly unsupervised clustering, avoiding cross-image label conflicts and trainable clustering components. The SCL uses an optimizable weight matrix and category centers to form an image-level soft contrastive loss , while testing relies on HDBSCAN clustering to assign forged/pristine labels without retraining; feature-level fusion further boosts performance. Empirically, FOCAL achieves large IoU gains across six cross-domain datasets, shows strong AUC on pixel- and image-level metrics, and demonstrates robustness to common post-processing and OSN transmissions, making it a competitive new benchmark for forgery detection.

Abstract

Image forgery detection aims to detect and locate forged regions in an image. Most existing forgery detection algorithms formulate classification problems to classify pixels into forged or pristine. However, the definition of forged and pristine pixels is only relative within one single image, e.g., a forged region in image A is actually a pristine one in its source image B (splicing forgery). Such a relative definition has been severely overlooked by existing methods, which unnecessarily mix forged (pristine) regions across different images into the same category. To resolve this dilemma, we propose the FOrensic ContrAstive cLustering (FOCAL) method, a novel, simple yet very effective paradigm based on soft contrastive learning and unsupervised clustering for the image forgery detection. Specifically, FOCAL 1) designs a soft contrastive learning (SCL) to supervise the high-level forensic feature extraction in an image-by-image manner, explicitly reflecting the above relative definition; 2) employs an on-the-fly unsupervised clustering algorithm (instead of a trained one) to cluster the learned features into forged/pristine categories, further suppressing the cross-image influence from training data; and 3) allows to further boost the detection performance via simple feature-level concatenation without the need of retraining. Extensive experimental results over six public testing datasets demonstrate that our proposed FOCAL significantly outperforms the state-of-the-art competitors by big margins: +24.8% on Coverage, +18.9% on Columbia, +17.3% on FF++, +15.3% on MISD, +15.0% on CASIA and +10.5% on NIST in terms of IoU (see also Fig. 1). The paradigm of FOCAL could bring fresh insights and serve as a novel benchmark for the image forgery detection task. The code is available at https://github.com/HighwayWu/FOCAL.
Paper Structure (21 sections, 13 equations, 16 figures, 8 tables)

This paper contains 21 sections, 13 equations, 16 figures, 8 tables.

Figures (16)

  • Figure 1: Our method significantly outperforms several state-of-the-art competing algorithms mvssnetcatnetpsccnetifosntruforwscl over six cross-testing datasets columbia2006coveragecasia2013nist2016multispfaceforensics2019.
  • Figure 2: First row: pristine and forged images. Second row: forgery masks, where pristine ($\alpha_1$, $\alpha_2$ and $\alpha_3$) and forged ($\beta_1$ and $\beta_2$) regions are labeled black and white.
  • Figure 3: (a) Traditional classification-based forgery detection framework; (b) Our proposed FOCAL framework, which utilizes soft contrastive learning to supervise the training phase, while employing an unsupervised clustering in the testing phase.
  • Figure 4: Comparison of initial (a) and optimized (b) $w_{ij}$. The blue and red markers respectively indicate the pristine and forged regions.
  • Figure 5: Training loss curves of the NCE baseline (blue), our proposed $\mathcal{L}_{\mathrm{SCL}}$ in batch-based (orange) and image-by-image (green), respectively.
  • ...and 11 more figures