Table of Contents
Fetching ...

LLMmap: Fingerprinting For Large Language Models

Dario Pasquini, Evgenios M. Kornaropoulos, Giuseppe Ateniese

TL;DR

<3-5 sentence high-level summary> LLMmap introduces an active fingerprinting framework to identify the exact LLM version powering an application by issuing carefully crafted prompts and learning from the responses. It combines a robust query strategy with a lightweight, contrastive/open-set inference model to achieve over 95% accuracy across 42 models with as few as eight interactions, and it remains effective across diverse deployment conditions and prompt configurations. The paper also analyzes defenses, showing that masking fingerprint signals is difficult and often degrades functionality, and discusses extensions to detect unseen models and potential future capabilities. Overall, LLMmap provides a practical, scalable tool for security evaluators to profile LLM deployments as part of red-teaming and risk assessment.

Abstract

We introduce LLMmap, a first-generation fingerprinting technique targeted at LLM-integrated applications. LLMmap employs an active fingerprinting approach, sending carefully crafted queries to the application and analyzing the responses to identify the specific LLM version in use. Our query selection is informed by domain expertise on how LLMs generate uniquely identifiable responses to thematically varied prompts. With as few as 8 interactions, LLMmap can accurately identify 42 different LLM versions with over 95% accuracy. More importantly, LLMmap is designed to be robust across different application layers, allowing it to identify LLM versions--whether open-source or proprietary--from various vendors, operating under various unknown system prompts, stochastic sampling hyperparameters, and even complex generation frameworks such as RAG or Chain-of-Thought. We discuss potential mitigations and demonstrate that, against resourceful adversaries, effective countermeasures may be challenging or even unrealizable.

LLMmap: Fingerprinting For Large Language Models

TL;DR

<3-5 sentence high-level summary> LLMmap introduces an active fingerprinting framework to identify the exact LLM version powering an application by issuing carefully crafted prompts and learning from the responses. It combines a robust query strategy with a lightweight, contrastive/open-set inference model to achieve over 95% accuracy across 42 models with as few as eight interactions, and it remains effective across diverse deployment conditions and prompt configurations. The paper also analyzes defenses, showing that masking fingerprint signals is difficult and often degrades functionality, and discusses extensions to detect unseen models and potential future capabilities. Overall, LLMmap provides a practical, scalable tool for security evaluators to profile LLM deployments as part of red-teaming and risk assessment.

Abstract

We introduce LLMmap, a first-generation fingerprinting technique targeted at LLM-integrated applications. LLMmap employs an active fingerprinting approach, sending carefully crafted queries to the application and analyzing the responses to identify the specific LLM version in use. Our query selection is informed by domain expertise on how LLMs generate uniquely identifiable responses to thematically varied prompts. With as few as 8 interactions, LLMmap can accurately identify 42 different LLM versions with over 95% accuracy. More importantly, LLMmap is designed to be robust across different application layers, allowing it to identify LLM versions--whether open-source or proprietary--from various vendors, operating under various unknown system prompts, stochastic sampling hyperparameters, and even complex generation frameworks such as RAG or Chain-of-Thought. We discuss potential mitigations and demonstrate that, against resourceful adversaries, effective countermeasures may be challenging or even unrealizable.
Paper Structure (39 sections, 4 equations, 13 figures, 8 tables, 5 algorithms)

This paper contains 39 sections, 4 equations, 13 figures, 8 tables, 5 algorithms.

Figures (13)

  • Figure 1: Active fingerprinting via LLMmap.
  • Figure 2: Difference in response of two LLMs upon a malicious prompt. The model Mixtral-8x7B, in contrast to gpt-4o-2024, tends to restate the harmful task in its answer.
  • Figure 3: The architecture of the inference model. We depict in blue the pre-trained modules that are not tuned in training.
  • Figure 4: Visualization of contrastive learning on LLMs' traces. Positive and negative case.
  • Figure 5: Closed-set accuracy of the inference model as the number of queries to the LLM-integrated application increases for LLMmap using the default query strategy and two baselines strategies.
  • ...and 8 more figures