Table of Contents
Fetching ...

From Injection to Defense: Constructing Edit-Based Fingerprints for Large Language Models

Yue Li, Xin Yi, Dongsheng Shi, Yongyi Cui, Gerard de Melo, Linlin Wang

TL;DR

This work tackles IP protection for large language models deployed in web applications by proposing an end-to-end fingerprinting pipeline. It introduces MNLF to create natural, multilingual triggers, RFEdit to inject fingerprints via targeted knowledge editing, and FSFT to preserve fingerprints during fine-tuning by constraining updates to a fingerprint subspace. The approach achieves 100% fingerprint success, robust defense against adversarial manipulation, and improved persistence under pruning, quantization, and fine-tuning, with notable gains when combined (e.g., >10% improvements on math and Alpaca downstream tasks). The work demonstrates a practical path to traceability and IP protection for LLMs in real-world deployment.

Abstract

Fingerprinting is critical for maintaining traceability and protecting the intellectual property (IP) of developers, as LLMs deployed in web applications are susceptible to unauthorized redistribution and misuse via fine-tuning or black-box deployment. However, current backdoor-based fingerprinting methods face a fundamental trade-off: fingerprints embedded as garbled text are easily detected and filtered, whereas those crafted as coherent natural language are prone to being triggered unintentionally. To overcome these limitations, we propose RFEdit, a knowledge-editing framework that embeds a rule-based multilingual natural language fingerprint (MNLF) by modifying a sparse subset of model weights. This approach enables efficient and robust fingerprint injection with minimal impact on unrelated knowledge in LLMs. Our RFEdit framework is further safeguarded by Fingerprint Subspace-aware Fine-Tuning (FSFT), which mitigates fingerprint degradation during legitimate fine-tuning by restricting parameter updates to the fingerprint subspace. This approach preserves fingerprint integrity while enhancing downstream task performance of LLMs. These advances establish a comprehensive pipeline from fingerprint injection to defense, achieving high detection effectiveness, robustness against adversarial manipulations, harmlessness to model utility, and persistence under fine-tuning. Extensive experiments demonstrate that RFEdit maintains robustness under quantization and pruning. Additionally, fingerprint effectiveness is generally improved by more than 10\% when combined with FSFT for math and alpaca downstream tasks.

From Injection to Defense: Constructing Edit-Based Fingerprints for Large Language Models

TL;DR

This work tackles IP protection for large language models deployed in web applications by proposing an end-to-end fingerprinting pipeline. It introduces MNLF to create natural, multilingual triggers, RFEdit to inject fingerprints via targeted knowledge editing, and FSFT to preserve fingerprints during fine-tuning by constraining updates to a fingerprint subspace. The approach achieves 100% fingerprint success, robust defense against adversarial manipulation, and improved persistence under pruning, quantization, and fine-tuning, with notable gains when combined (e.g., >10% improvements on math and Alpaca downstream tasks). The work demonstrates a practical path to traceability and IP protection for LLMs in real-world deployment.

Abstract

Fingerprinting is critical for maintaining traceability and protecting the intellectual property (IP) of developers, as LLMs deployed in web applications are susceptible to unauthorized redistribution and misuse via fine-tuning or black-box deployment. However, current backdoor-based fingerprinting methods face a fundamental trade-off: fingerprints embedded as garbled text are easily detected and filtered, whereas those crafted as coherent natural language are prone to being triggered unintentionally. To overcome these limitations, we propose RFEdit, a knowledge-editing framework that embeds a rule-based multilingual natural language fingerprint (MNLF) by modifying a sparse subset of model weights. This approach enables efficient and robust fingerprint injection with minimal impact on unrelated knowledge in LLMs. Our RFEdit framework is further safeguarded by Fingerprint Subspace-aware Fine-Tuning (FSFT), which mitigates fingerprint degradation during legitimate fine-tuning by restricting parameter updates to the fingerprint subspace. This approach preserves fingerprint integrity while enhancing downstream task performance of LLMs. These advances establish a comprehensive pipeline from fingerprint injection to defense, achieving high detection effectiveness, robustness against adversarial manipulations, harmlessness to model utility, and persistence under fine-tuning. Extensive experiments demonstrate that RFEdit maintains robustness under quantization and pruning. Additionally, fingerprint effectiveness is generally improved by more than 10\% when combined with FSFT for math and alpaca downstream tasks.

Paper Structure

This paper contains 38 sections, 20 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Our pipeline of edit-based fingerprints comprises three steps: (a) MNLF, a rule-based multilingual natural-language fingerprint whose tokens resemble normal inputs and can bypass abnormal-input filters, ensuring reliable verification; (b) RFEdit, a knowledge-editing method that injects fingerprints efficiently while improving robustness to adversarial examples, producing reliable fingerprinted models; (c) FSFT, fingerprint subspace-aware fine-tuning that constrains updates to the fingerprint subspace, preserving fingerprint persistence during downstream fine-tuning.
  • Figure 2: The changes in $\text{FSR}^*$ (red, lower values indicate lower fingerprint retention) and the sum of the Frobenius norm values (green, lower values indicate higher fingerprint retention) during the fine-tuning process.
  • Figure 3: Performance of different fingerprinting scheme under PPL-based filtering.
  • Figure 4: Comparison of effectiveness to learning rate variations, the unit of the learning rate on the x axis is e-5.