Independence Tests for Language Models

Sally Zhu; Ahmed Ahmed; Rohith Kuditipudi; Percy Liang

Independence Tests for Language Models

Sally Zhu, Ahmed Ahmed, Rohith Kuditipudi, Percy Liang

TL;DR

This work formalizes independence testing between language-model weights, presenting provable p-values in a constrained setting via exchangeable copies and permutation-equivariant training, alongside a robust unconstrained test that aligns hidden activations and supports localization across architectures. The constrained framework leverages a family of test statistics with per-layer aggregation (e.g., $\phi_{U^{(\ell)}}$, $\phi_{H^{(\ell)}}$) and Fisher’s method, while the unconstrained approach introduces $\phi_{MATCH}$ to tolerate architectural changes and adversarial transformations. Empirically, the methods identify dependent pairs among 21 open-weight LLama models, while the unconstrained test demonstrates near-uniform behavior under independence and detects shared components even after pruning or retraining individual layers. This work advances model provenance and IP protection by enabling provable or robust evidence of non-independence without requiring retraining, and provides a foundation for localized discovery of shared substructures across architectures.

Abstract

We consider the following problem: given the weights of two models, can we test whether they were trained independently -- i.e., from independent random initializations? We consider two settings: constrained and unconstrained. In the constrained setting, we make assumptions about model architecture and training and propose a family of statistical tests that yield exact p-values with respect to the null hypothesis that the models are trained from independent random initializations. These p-values are valid regardless of the composition of either model's training data; we compute them by simulating exchangeable copies of each model under our assumptions and comparing various similarity measures of weights and activations between the original two models versus these copies. We report the p-values from these tests on pairs of 21 open-weight models (210 total pairs) and correctly identify all pairs of non-independent models. Our tests remain effective even if one model was fine-tuned for many tokens. In the unconstrained setting, where we make no assumptions about training procedures, can change model architecture, and allow for adversarial evasion attacks, the previous tests no longer work. Instead, we propose a new test which matches hidden activations between two models, and which is robust to adversarial transformations and to changes in model architecture. The test can also do localized testing: identifying specific non-independent components of models. Though we no longer obtain exact p-values from this, empirically we find it behaves as one and reliably identifies non-independent models. Notably, we can use the test to identify specific parts of one model that are derived from another (e.g., how Llama 3.1-8B was pruned to initialize Llama 3.2-3B, or shared layers between Mistral-7B and StripedHyena-7B), and it is even robust to retraining individual layers of either model from scratch.

Independence Tests for Language Models

TL;DR

Abstract

Independence Tests for Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (16)