Table of Contents
Fetching ...

DevFD: Developmental Face Forgery Detection by Learning Shared and Orthogonal LoRA Subspaces

Tianshuo Zhang, Li Gao, Siran Peng, Xiangyu Zhu, Zhen Lei

TL;DR

It is posited that genuine facial samples are abundant and relatively stable in acquisition methods, while forgery faces continuously evolve with the iteration of manipulation techniques, while forgery faces continuously evolve with the iteration of manipulation techniques.

Abstract

The rise of realistic digital face generation and manipulation poses significant social risks. The primary challenge lies in the rapid and diverse evolution of generation techniques, which often outstrip the detection capabilities of existing models. To defend against the ever-evolving new types of forgery, we need to enable our model to quickly adapt to new domains with limited computation and data while avoiding forgetting previously learned forgery types. In this work, we posit that genuine facial samples are abundant and relatively stable in acquisition methods, while forgery faces continuously evolve with the iteration of manipulation techniques. Given the practical infeasibility of exhaustively collecting all forgery variants, we frame face forgery detection as a continual learning problem and allow the model to develop as new forgery types emerge. Specifically, we employ a Developmental Mixture of Experts (MoE) architecture that uses LoRA models as its individual experts. These experts are organized into two groups: a Real-LoRA to learn and refine knowledge of real faces, and multiple Fake-LoRAs to capture incremental information from different forgery types. To prevent catastrophic forgetting, we ensure that the learning direction of Fake-LoRAs is orthogonal to the established subspace. Moreover, we integrate orthogonal gradients into the orthogonal loss of Fake-LoRAs, preventing gradient interference throughout the training process of each task. Experimental results under both the datasets and manipulation types incremental protocols demonstrate the effectiveness of our method.

DevFD: Developmental Face Forgery Detection by Learning Shared and Orthogonal LoRA Subspaces

TL;DR

It is posited that genuine facial samples are abundant and relatively stable in acquisition methods, while forgery faces continuously evolve with the iteration of manipulation techniques, while forgery faces continuously evolve with the iteration of manipulation techniques.

Abstract

The rise of realistic digital face generation and manipulation poses significant social risks. The primary challenge lies in the rapid and diverse evolution of generation techniques, which often outstrip the detection capabilities of existing models. To defend against the ever-evolving new types of forgery, we need to enable our model to quickly adapt to new domains with limited computation and data while avoiding forgetting previously learned forgery types. In this work, we posit that genuine facial samples are abundant and relatively stable in acquisition methods, while forgery faces continuously evolve with the iteration of manipulation techniques. Given the practical infeasibility of exhaustively collecting all forgery variants, we frame face forgery detection as a continual learning problem and allow the model to develop as new forgery types emerge. Specifically, we employ a Developmental Mixture of Experts (MoE) architecture that uses LoRA models as its individual experts. These experts are organized into two groups: a Real-LoRA to learn and refine knowledge of real faces, and multiple Fake-LoRAs to capture incremental information from different forgery types. To prevent catastrophic forgetting, we ensure that the learning direction of Fake-LoRAs is orthogonal to the established subspace. Moreover, we integrate orthogonal gradients into the orthogonal loss of Fake-LoRAs, preventing gradient interference throughout the training process of each task. Experimental results under both the datasets and manipulation types incremental protocols demonstrate the effectiveness of our method.

Paper Structure

This paper contains 27 sections, 18 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: (a) The t-SNE visualization of features extracted from the baseline on FF++ ffpp and CDF2 celeb shows that real faces exhibit a close distribution, while fake faces form five distinct clusters. This observation inspires us to (b) propose a developmental mixture of experts to model the continuous emergence of unknown fake faces using a set of orthogonal LoRA subspaces, while concurrently employing a dedicated LoRA to preserve the commonalities of authentic faces. A label-guided localized balancing strategy employs the LoRA sequence to separately model the common real faces and the incremental fake types information.
  • Figure 2: The proposed DevFD framework employs a Developmental MoE architecture to fine-tune the FFN layer in each transformer block. The architecture establishes a developmental LoRA sequence, which adds new branches as the number of tasks increases, enabling the model to handle emerging new types of forgeries. A new label-guided localized balancing strategy allocates the LoRA sequence into two purposes: the Real-LoRA fine-tune and refine knowledge about real faces, while the Fake-LoRAs compose an orthogonal sequence to model the unique cues of fake faces. We integrate orthogonal gradients into the orthogonal loss to alleviate the interference of gradients on previously learned tasks during the training phase when orthogonality is not yet achieved, thereby achieving a lower rate of forgetting.
  • Figure 3: The Task10 long-sequence continual learning experiment based on DF40, the proposed method achieves the highest average accuracy and the lowest forgetting rate.
  • Figure 4: Motivation Validation. We track the variation range of the orthogonal gradient loss during the training process for two orthogonal sequences, one modeling real faces and the other modeling forged faces. For each sequence, the two lines represent the upper and lower bounds of the loss, and the shaded area between them indicates its range of variation. We observe that real faces exhibit an obvious smaller orthogonal gradient loss.
  • Figure 5: Expanded parameter analysis. The parameter expansion of the proposed developmental MoE is fully controllable, and the trainable parameters will remain constant throughout the task sequence.
  • ...and 1 more figures