Table of Contents
Fetching ...

Invisible Traces: Using Hybrid Fingerprinting to identify underlying LLMs in GenAI Apps

Devansh Bhardwaj, Naman Mishra

TL;DR

This work tackles the challenge of identifying underlying LLMs in GenAI apps under real-world constraints where single-method fingerprinting fails. It introduces a hybrid framework that combines static probing (LLMMap and manual fingerprinting) with dynamic observation (ModernBERT classifier) in a two-phase pipeline, fused via a weight $\alpha$, to produce $P_{final}$. Empirical evaluations across 1000+ apps and 14 LLMs show the hybrid approach significantly outperforms individual methods, achieving around 86% accuracy at $n=10$ and demonstrating strong class-wise separability in embeddings. The results have important implications for AI security, governance, and red-teaming, offering a practical, adaptable method for monitoring and verifying deployed LLMs while acknowledging ethical considerations and potential misuse.

Abstract

Fingerprinting refers to the process of identifying underlying Machine Learning (ML) models of AI Systemts, such as Large Language Models (LLMs), by analyzing their unique characteristics or patterns, much like a human fingerprint. The fingerprinting of Large Language Models (LLMs) has become essential for ensuring the security and transparency of AI-integrated applications. While existing methods primarily rely on access to direct interactions with the application to infer model identity, they often fail in real-world scenarios involving multi-agent systems, frequent model updates, and restricted access to model internals. In this paper, we introduce a novel fingerprinting framework designed to address these challenges by integrating static and dynamic fingerprinting techniques. Our approach identifies architectural features and behavioral traits, enabling accurate and robust fingerprinting of LLMs in dynamic environments. We also highlight new threat scenarios where traditional fingerprinting methods are ineffective, bridging the gap between theoretical techniques and practical application. To validate our framework, we present an extensive evaluation setup that simulates real-world conditions and demonstrate the effectiveness of our methods in identifying and monitoring LLMs in Gen-AI applications. Our results highlight the framework's adaptability to diverse and evolving deployment contexts.

Invisible Traces: Using Hybrid Fingerprinting to identify underlying LLMs in GenAI Apps

TL;DR

This work tackles the challenge of identifying underlying LLMs in GenAI apps under real-world constraints where single-method fingerprinting fails. It introduces a hybrid framework that combines static probing (LLMMap and manual fingerprinting) with dynamic observation (ModernBERT classifier) in a two-phase pipeline, fused via a weight , to produce . Empirical evaluations across 1000+ apps and 14 LLMs show the hybrid approach significantly outperforms individual methods, achieving around 86% accuracy at and demonstrating strong class-wise separability in embeddings. The results have important implications for AI security, governance, and red-teaming, offering a practical, adaptable method for monitoring and verifying deployed LLMs while acknowledging ethical considerations and potential misuse.

Abstract

Fingerprinting refers to the process of identifying underlying Machine Learning (ML) models of AI Systemts, such as Large Language Models (LLMs), by analyzing their unique characteristics or patterns, much like a human fingerprint. The fingerprinting of Large Language Models (LLMs) has become essential for ensuring the security and transparency of AI-integrated applications. While existing methods primarily rely on access to direct interactions with the application to infer model identity, they often fail in real-world scenarios involving multi-agent systems, frequent model updates, and restricted access to model internals. In this paper, we introduce a novel fingerprinting framework designed to address these challenges by integrating static and dynamic fingerprinting techniques. Our approach identifies architectural features and behavioral traits, enabling accurate and robust fingerprinting of LLMs in dynamic environments. We also highlight new threat scenarios where traditional fingerprinting methods are ineffective, bridging the gap between theoretical techniques and practical application. To validate our framework, we present an extensive evaluation setup that simulates real-world conditions and demonstrate the effectiveness of our methods in identifying and monitoring LLMs in Gen-AI applications. Our results highlight the framework's adaptability to diverse and evolving deployment contexts.

Paper Structure

This paper contains 36 sections, 9 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Pipeline for Combined Fingerprinting Framework. The framework integrates static and dynamic fingerprinting approaches to identify underlying Large Language Models (LLMs). (I) Static Fingerprinting using LLMMap actively probes the target application with strategic queries, generating query-response pairs that are analyzed by a classifier to produce a model distribution. (II) Manual Fingerprinting employs an iterative probing process, guided by a judge model, to refine outputs and improve identification accuracy. (III) Dynamic Fingerprinting passively observes generic model responses and leverages a dynamic classifier to infer the model distribution. The outputs from static and dynamic fingerprinting are combined using a weighted sum to robustly determine the underlying model.
  • Figure 2: Accuracy vs. $n$ for Different Methods (Scaled 0 to 1). This figure compares the performance of various fingerprinting methodologies, including dynamic, static, and combined approaches, as the number of iterations ($n$) increases. The combined approach (Dynamic + LLMMap) achieves the highest accuracy, demonstrating the benefit of integrating multiple techniques.
  • Figure 3: t-SNE visualization of embeddings for different LLM families. Each cluster represents the distinct linguistic and stylistic features captured for a model family, such as Claude, GPT, Gemini, and others. Minimal overlap between clusters demonstrates the effectiveness of the fingerprinting pipeline in distinguishing between LLM outputs.
  • Figure 4: Accuracy per Class vs. Number of Samples (n). This figure shows the identification accuracy of each Large Language Model (LLM) as the number of samples increases. Models such as GPT-4 and Claude-3.5-sonnet achieve near-perfect accuracy, while others like Mixtral-8x22B show variability. The results demonstrate improved performance with additional samples