Table of Contents
Fetching ...

Key-Conditioned Orthonormal Transform Gating (K-OTG): Multi-Key Access Control with Hidden-State Scrambling for LoRA-Tuned Models

Muhammad Haris Khan

TL;DR

The paper introduces K-OTG, a secret-key gated, PEFT-friendly mechanism to restrict deployment-time use of instruction-tuned LLMs. It combines dual-path training (authorized vs unauthorized) with an orthonormal, right-multiplying hidden-state transform conditioned on role and per-request nonce, enabling exact inverse cancellation for authorized inputs and scrambled, blocked outputs for unauthorized ones. Evaluation on 1.5–3B class models shows authorized utility closely tracking baselines with modest perplexity increases, while unauthorized cases yield near-zero utility and stable blocking, demonstrated across selectivity, nonce invariance, and throughput benchmarks. The approach emphasizes practical usability and integration with LoRA, offering a scalable, model-agnostic way to prevent unauthorized use with clear limitations and operational considerations.

Abstract

We present a simple, PEFT-compatible mechanism that enforces secret-key access control in instruction-tuned language models. K-OTG trains on a dual-path corpus: authorized examples (prefixed with a role key) learn the task output, while unauthorized examples learn a visible block token. At inference, a pre-lm_head hook applies an orthonormal transform to the hidden state: with the correct key/role the inverse map restores the model's native basis; otherwise a session-ephemeral scrambler (permutation, sign flips, Householders) makes logits uninformative and the system short-circuits to BLOCK. Keys are not added as special tokens, and the method composes cleanly with LoRA on 4-bit bases. We evaluate an hour-scale protocol on 1-3B-class instruction models (Llama 3.2, Qwen2.5 1.5B) across utility (XSum ROUGE/BLEU, GSM8K accuracy, WikiText-2 perplexity), selectivity (3by3 role-key unlock matrices), nonce invariance, block suppression, and throughput. Authorized utility remains close to the base on summarization with the expected modest PPL increase from instruction tuning; unauthorized utility collapses (near-zero sequence metrics with exploding PPL), indicating practical unusability without the key. Unlock matrices are diagonally dominant (high on-target unlock, low cross-unlock), authorized block emission is 0 per N via robust bad-word lists, and greedy outputs match exactly across nonces, confirming correct inverse cancellation. The runtime overhead of the Python-level hook is 40% tokens per sec versus the base. K-OTG therefore provides a pragmatic, model-agnostic way to prevent unauthorized use while preserving authorized utility.

Key-Conditioned Orthonormal Transform Gating (K-OTG): Multi-Key Access Control with Hidden-State Scrambling for LoRA-Tuned Models

TL;DR

The paper introduces K-OTG, a secret-key gated, PEFT-friendly mechanism to restrict deployment-time use of instruction-tuned LLMs. It combines dual-path training (authorized vs unauthorized) with an orthonormal, right-multiplying hidden-state transform conditioned on role and per-request nonce, enabling exact inverse cancellation for authorized inputs and scrambled, blocked outputs for unauthorized ones. Evaluation on 1.5–3B class models shows authorized utility closely tracking baselines with modest perplexity increases, while unauthorized cases yield near-zero utility and stable blocking, demonstrated across selectivity, nonce invariance, and throughput benchmarks. The approach emphasizes practical usability and integration with LoRA, offering a scalable, model-agnostic way to prevent unauthorized use with clear limitations and operational considerations.

Abstract

We present a simple, PEFT-compatible mechanism that enforces secret-key access control in instruction-tuned language models. K-OTG trains on a dual-path corpus: authorized examples (prefixed with a role key) learn the task output, while unauthorized examples learn a visible block token. At inference, a pre-lm_head hook applies an orthonormal transform to the hidden state: with the correct key/role the inverse map restores the model's native basis; otherwise a session-ephemeral scrambler (permutation, sign flips, Householders) makes logits uninformative and the system short-circuits to BLOCK. Keys are not added as special tokens, and the method composes cleanly with LoRA on 4-bit bases. We evaluate an hour-scale protocol on 1-3B-class instruction models (Llama 3.2, Qwen2.5 1.5B) across utility (XSum ROUGE/BLEU, GSM8K accuracy, WikiText-2 perplexity), selectivity (3by3 role-key unlock matrices), nonce invariance, block suppression, and throughput. Authorized utility remains close to the base on summarization with the expected modest PPL increase from instruction tuning; unauthorized utility collapses (near-zero sequence metrics with exploding PPL), indicating practical unusability without the key. Unlock matrices are diagonally dominant (high on-target unlock, low cross-unlock), authorized block emission is 0 per N via robust bad-word lists, and greedy outputs match exactly across nonces, confirming correct inverse cancellation. The runtime overhead of the Python-level hook is 40% tokens per sec versus the base. K-OTG therefore provides a pragmatic, model-agnostic way to prevent unauthorized use while preserving authorized utility.

Paper Structure

This paper contains 15 sections, 5 equations, 2 figures, 4 tables, 5 algorithms.

Figures (2)

  • Figure 1: Selectivity: 3$\times$3 role--key unlock matrices. Entries are the fraction of prompts (majority over nonces) for which outputs are nontrivial and role-appropriate under each key. Strong diagonal dominance ($\geq 0.91$) with low off-diagonals ($\leq 0.10$) indicates keys unlock their intended roles with minimal cross-unlock.
  • Figure 2: Qualitative examples (authorized vs. unauthorized). Each card shows the same prompt with and without the correct key. With the key, the model produces coherent, role-appropriate content; without the key it emits the block marker, illustrating un-usability by design. All three panels are forced to equal height for visual consistency.