Table of Contents
Fetching ...

Origin Tracer: A Method for Detecting LoRA Fine-Tuning Origins in LLMs

Hongyu Liang, Yuting Zheng, Yihan Li, Yiran Zhang, Shiyu Liang

TL;DR

This work tackles the problem of verifying whether a fine-tuned model originates from a specified base model, particularly under obfuscation strategies that obscure parameter mappings. It introduces Origin-Tracer, a framework that (i) extracts the LoRA rank information from intermediate states via singular-value analysis across transformer layers and (ii) reconstructs obfuscated intermediate representations through an iterative gradient-based procedure, enabling robust provenance verification even when permutations or scaling have been applied. The approach is formalized through a problem statement that models looser-than-close relationships between base and derivative models, and it is validated on 31 open-source models fine-tuned with LoRA, demonstrating resilience to obfuscation and providing a path toward new benchmarks for model-verification. The findings indicate that middle transformer layers are especially informative for rank recovery and that the Origin-Tracer framework can establish practical, obstruction-tolerant provenance checks with potential impact on transparency and trust in open-source AI deployments, while also outlining specific limitations and avenues for broader applicability.

Abstract

As large language models (LLMs) continue to advance, their deployment often involves fine-tuning to enhance performance on specific downstream tasks. However, this customization is sometimes accompanied by misleading claims about the origins, raising significant concerns about transparency and trust within the open-source community. Existing model verification techniques typically assess functional, representational, and weight similarities. However, these approaches often struggle against obfuscation techniques, such as permutations and scaling transformations. To address this limitation, we propose a novel detection method Origin-Tracer that rigorously determines whether a model has been fine-tuned from a specified base model. This method includes the ability to extract the LoRA rank utilized during the fine-tuning process, providing a more robust verification framework. This framework is the first to provide a formalized approach specifically aimed at pinpointing the sources of model fine-tuning. We empirically validated our method on thirty-one diverse open-source models under conditions that simulate real-world obfuscation scenarios. We empirically analyze the effectiveness of our framework and finally, discuss its limitations. The results demonstrate the effectiveness of our approach and indicate its potential to establish new benchmarks for model verification.

Origin Tracer: A Method for Detecting LoRA Fine-Tuning Origins in LLMs

TL;DR

This work tackles the problem of verifying whether a fine-tuned model originates from a specified base model, particularly under obfuscation strategies that obscure parameter mappings. It introduces Origin-Tracer, a framework that (i) extracts the LoRA rank information from intermediate states via singular-value analysis across transformer layers and (ii) reconstructs obfuscated intermediate representations through an iterative gradient-based procedure, enabling robust provenance verification even when permutations or scaling have been applied. The approach is formalized through a problem statement that models looser-than-close relationships between base and derivative models, and it is validated on 31 open-source models fine-tuned with LoRA, demonstrating resilience to obfuscation and providing a path toward new benchmarks for model-verification. The findings indicate that middle transformer layers are especially informative for rank recovery and that the Origin-Tracer framework can establish practical, obstruction-tolerant provenance checks with potential impact on transparency and trust in open-source AI deployments, while also outlining specific limitations and avenues for broader applicability.

Abstract

As large language models (LLMs) continue to advance, their deployment often involves fine-tuning to enhance performance on specific downstream tasks. However, this customization is sometimes accompanied by misleading claims about the origins, raising significant concerns about transparency and trust within the open-source community. Existing model verification techniques typically assess functional, representational, and weight similarities. However, these approaches often struggle against obfuscation techniques, such as permutations and scaling transformations. To address this limitation, we propose a novel detection method Origin-Tracer that rigorously determines whether a model has been fine-tuned from a specified base model. This method includes the ability to extract the LoRA rank utilized during the fine-tuning process, providing a more robust verification framework. This framework is the first to provide a formalized approach specifically aimed at pinpointing the sources of model fine-tuning. We empirically validated our method on thirty-one diverse open-source models under conditions that simulate real-world obfuscation scenarios. We empirically analyze the effectiveness of our framework and finally, discuss its limitations. The results demonstrate the effectiveness of our approach and indicate its potential to establish new benchmarks for model verification.

Paper Structure

This paper contains 14 sections, 3 theorems, 51 equations, 5 figures, 1 table.

Key Result

Lemma 1

For a given $x \in \mathbb{R}^{n \times d}$, MLP $:\mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{n \times d}$ defined as where the gate $G$ and up $U$ matrices are defined as $h(X;\gamma')$ is the normalization and $\odot$ denotes element-wise matrix product. The MLP function is injective for non-parallel vectors.

Figures (5)

  • Figure 1: The detection of Reflection-70B with (w/o) obfuscation. Comparison of Origin-Tracer and Parameter Similarity Performance: Without Obfuscation (a, b) vs. With Obfuscation (c, d). Our method demonstrates resilience to obfuscation, while parameter similarity is more susceptible to its effects.
  • Figure 2: Decoder-only Architecture.
  • Figure 3: Norm of Layer Outputs Across Model Architectures. This figure presents the L2 norms of outputs across layers in models of varying sizes.
  • Figure 4: Origin-Tracer determines the LoRA rank by pinpointing a sharp decline in singular values, which manifests as a peak in the disparity between consecutive singular values. In the model, this peak occurs at a position adjacent to the rank. Subfigures (a)–(f) cover LLaMA3.1-8B, LLaMA3-8B, LLaMA2-13B, LLaMA2-7B, Mistral-7B-v0.1, and 70B-scale models.
  • Figure 5: Layer-wise extracted ranks across different model scales. This figure presents the extracted LoRA ranks for each transformer layer across various models. Subfigures (a)–(d) correspond to 7B, 8B, 13B, and 70B model families, respectively. Middle layers consistently exhibit higher extracted ranks, indicating more expressive transformations and suggesting their greater importance in model reconstruction.

Theorems & Definitions (6)

  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Theorem 3
  • proof