Table of Contents
Fetching ...

ChatGPT Encounters Morphing Attack Detection: Zero-Shot MAD with Multi-Modal Large Language Models and General Vision Models

Haoyu Zhang, Raghavendra Ramachandra, Kiran Raja, Christoph Busch

TL;DR

This work tackles generalization and explainability gaps in single-image Morphing Attack Detection (S-MAD) by introducing zero-shot MAD methods that do not rely on morphing training data. It develops two pathways: a multimodal large language model (LLM)–based MAD that uses carefully designed prompts with Chain-of-Thought reasoning, and a general vision-model–based MAD that relies on an anchor embedding built from bona fide images. Experiments on a print-scanned morph dataset across three morphing algorithms show that LLM-based ZS-MAD can generalize to unseen morphs and provide human-readable explanations, while the vision-model baseline offers competitive zero-shot performance depending on configuration. The findings highlight the potential of LLMs to generalize to MAD tasks without extensive labeled data and to improve transparency at enrolment stations and border controls, guiding future work in prompt design, few-shot adaptation, and evaluation on real, non-synthetic data.

Abstract

Face Recognition Systems (FRS) are increasingly vulnerable to face-morphing attacks, prompting the development of Morphing Attack Detection (MAD) algorithms. However, a key challenge in MAD lies in its limited generalizability to unseen data and its lack of explainability-critical for practical application environments such as enrolment stations and automated border control systems. Recognizing that most existing MAD algorithms rely on supervised learning paradigms, this work explores a novel approach to MAD using zero-shot learning leveraged on Large Language Models (LLMs). We propose two types of zero-shot MAD algorithms: one leveraging general vision models and the other utilizing multimodal LLMs. For general vision models, we address the MAD task by computing the mean support embedding of an independent support set without using morphed images. For the LLM-based approach, we employ the state-of-the-art GPT-4 Turbo API with carefully crafted prompts. To evaluate the feasibility of zero-shot MAD and the effectiveness of the proposed methods, we constructed a print-scan morph dataset featuring various unseen morphing algorithms, simulating challenging real-world application scenarios. Experimental results demonstrated notable detection accuracy, validating the applicability of zero-shot learning for MAD tasks. Additionally, our investigation into LLM-based MAD revealed that multimodal LLMs, such as ChatGPT, exhibit remarkable generalizability to untrained MAD tasks. Furthermore, they possess a unique ability to provide explanations and guidance, which can enhance transparency and usability for end-users in practical applications.

ChatGPT Encounters Morphing Attack Detection: Zero-Shot MAD with Multi-Modal Large Language Models and General Vision Models

TL;DR

This work tackles generalization and explainability gaps in single-image Morphing Attack Detection (S-MAD) by introducing zero-shot MAD methods that do not rely on morphing training data. It develops two pathways: a multimodal large language model (LLM)–based MAD that uses carefully designed prompts with Chain-of-Thought reasoning, and a general vision-model–based MAD that relies on an anchor embedding built from bona fide images. Experiments on a print-scanned morph dataset across three morphing algorithms show that LLM-based ZS-MAD can generalize to unseen morphs and provide human-readable explanations, while the vision-model baseline offers competitive zero-shot performance depending on configuration. The findings highlight the potential of LLMs to generalize to MAD tasks without extensive labeled data and to improve transparency at enrolment stations and border controls, guiding future work in prompt design, few-shot adaptation, and evaluation on real, non-synthetic data.

Abstract

Face Recognition Systems (FRS) are increasingly vulnerable to face-morphing attacks, prompting the development of Morphing Attack Detection (MAD) algorithms. However, a key challenge in MAD lies in its limited generalizability to unseen data and its lack of explainability-critical for practical application environments such as enrolment stations and automated border control systems. Recognizing that most existing MAD algorithms rely on supervised learning paradigms, this work explores a novel approach to MAD using zero-shot learning leveraged on Large Language Models (LLMs). We propose two types of zero-shot MAD algorithms: one leveraging general vision models and the other utilizing multimodal LLMs. For general vision models, we address the MAD task by computing the mean support embedding of an independent support set without using morphed images. For the LLM-based approach, we employ the state-of-the-art GPT-4 Turbo API with carefully crafted prompts. To evaluate the feasibility of zero-shot MAD and the effectiveness of the proposed methods, we constructed a print-scan morph dataset featuring various unseen morphing algorithms, simulating challenging real-world application scenarios. Experimental results demonstrated notable detection accuracy, validating the applicability of zero-shot learning for MAD tasks. Additionally, our investigation into LLM-based MAD revealed that multimodal LLMs, such as ChatGPT, exhibit remarkable generalizability to untrained MAD tasks. Furthermore, they possess a unique ability to provide explanations and guidance, which can enhance transparency and usability for end-users in practical applications.

Paper Structure

This paper contains 14 sections, 1 equation, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Overview of the proposed method using large language model for zero-shot MAD.
  • Figure 2: Example of extreme cases in Prompt 3 with incorrect and correct classification.
  • Figure 3: DET plot of ZS-MAD using LLM with different prompts.
  • Figure 4: DET plot of ZS-MAD using vision models with different configurations.
  • Figure 5: Examples (of correctly classified morphs) using different proposed prompts for ZS-MAD and their answers.
  • ...and 3 more figures