Table of Contents
Fetching ...

How Good is ChatGPT at Face Biometrics? A First Look into Recognition, Soft Biometrics, and Explainability

Ivan DeAndres-Tame, Ruben Tolosana, Ruben Vera-Rodriguez, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia

TL;DR

This paper investigates how well a GPT-4 multimodal chatbot (ChatGPT) can perform face biometrics tasks, including face verification, soft biometrics estimation, and explainability. It employs a structured experimental framework with single-image and matrix prompts, benchmarking against ArcFace, AdaFace, FairFace, and MAAD-Face baselines across multiple datasets. Key findings show that while ChatGPT provides meaningful explainability and zero-shot utility, its verification accuracy lags behind specialized models and reveals demographic biases; a matrix-prompt configuration offers a cost-efficient tradeoff at the expense of accuracy. The work highlights both the potential and limitations of LLM-driven biometric reasoning and provides code to support reproducibility and further research.

Abstract

Large Language Models (LLMs) such as GPT developed by OpenAI, have already shown astonishing results, introducing quick changes in our society. This has been intensified by the release of ChatGPT which allows anyone to interact in a simple conversational way with LLMs, without any experience in the field needed. As a result, ChatGPT has been rapidly applied to many different tasks such as code- and song-writer, education, virtual assistants, etc., showing impressive results for tasks for which it was not trained (zero-shot learning). The present study aims to explore the ability of ChatGPT, based on the recent GPT-4 multimodal LLM, for the task of face biometrics. In particular, we analyze the ability of ChatGPT to perform tasks such as face verification, soft-biometrics estimation, and explainability of the results. ChatGPT could be very valuable to further increase the explainability and transparency of automatic decisions in human scenarios. Experiments are carried out in order to evaluate the performance and robustness of ChatGPT, using popular public benchmarks and comparing the results with state-of-the-art methods in the field. The results achieved in this study show the potential of LLMs such as ChatGPT for face biometrics, especially to enhance explainability. For reproducibility reasons, we release all the code in GitHub.

How Good is ChatGPT at Face Biometrics? A First Look into Recognition, Soft Biometrics, and Explainability

TL;DR

This paper investigates how well a GPT-4 multimodal chatbot (ChatGPT) can perform face biometrics tasks, including face verification, soft biometrics estimation, and explainability. It employs a structured experimental framework with single-image and matrix prompts, benchmarking against ArcFace, AdaFace, FairFace, and MAAD-Face baselines across multiple datasets. Key findings show that while ChatGPT provides meaningful explainability and zero-shot utility, its verification accuracy lags behind specialized models and reveals demographic biases; a matrix-prompt configuration offers a cost-efficient tradeoff at the expense of accuracy. The work highlights both the potential and limitations of LLM-driven biometric reasoning and provides code to support reproducibility and further research.

Abstract

Large Language Models (LLMs) such as GPT developed by OpenAI, have already shown astonishing results, introducing quick changes in our society. This has been intensified by the release of ChatGPT which allows anyone to interact in a simple conversational way with LLMs, without any experience in the field needed. As a result, ChatGPT has been rapidly applied to many different tasks such as code- and song-writer, education, virtual assistants, etc., showing impressive results for tasks for which it was not trained (zero-shot learning). The present study aims to explore the ability of ChatGPT, based on the recent GPT-4 multimodal LLM, for the task of face biometrics. In particular, we analyze the ability of ChatGPT to perform tasks such as face verification, soft-biometrics estimation, and explainability of the results. ChatGPT could be very valuable to further increase the explainability and transparency of automatic decisions in human scenarios. Experiments are carried out in order to evaluate the performance and robustness of ChatGPT, using popular public benchmarks and comparing the results with state-of-the-art methods in the field. The results achieved in this study show the potential of LLMs such as ChatGPT for face biometrics, especially to enhance explainability. For reproducibility reasons, we release all the code in GitHub.
Paper Structure (12 sections, 5 figures, 5 tables)

This paper contains 12 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Graphical representation of the analysis carried out in this study, focused on the ability of ChatGPT to perform tasks such as face verification, soft-biometrics estimation, and explainability of the results. Different configurations of ChatGPT are explored in the present study.
  • Figure 2: Graphical representations of the images input to ChatGPT: a comparison of two faces merged in a single image (left), and a matrix of 4x3 face comparisons in a single image (right). In the latter, each cell is separated from the rest by a blue border and identified by a red number (from 0 to 11) that is used to reference the cell in the output of the model.
  • Figure 3: Prompt inserted to ChatGPT together with the different outputs provided for different face images. We highlight the most important soft-biometrics attributes in green/red color if they are correct/incorrect, respectively.
  • Figure 4: Explainability: Proposed prompt along with the outputs provided by ChatGPT for some examples of the different face verification databases. Left column: examples where ChatGPT answers are correct; right column: incorrect answers. We highlight the most important parts of the text in green/red color if they are correct/incorrect, respectively.
  • Figure 5: Soft Biometrics: Proposed prompt along with the outputs provided by ChatGPT for some examples of the MAAD-Face database terhorst2021maad. We highlight the most important parts of the text in green/red color if they are correct/incorrect, respectively.