Key-Conditioned Orthonormal Transform Gating (K-OTG): Multi-Key Access Control with Hidden-State Scrambling for LoRA-Tuned Models
Muhammad Haris Khan
TL;DR
The paper introduces K-OTG, a secret-key gated, PEFT-friendly mechanism to restrict deployment-time use of instruction-tuned LLMs. It combines dual-path training (authorized vs unauthorized) with an orthonormal, right-multiplying hidden-state transform conditioned on role and per-request nonce, enabling exact inverse cancellation for authorized inputs and scrambled, blocked outputs for unauthorized ones. Evaluation on 1.5–3B class models shows authorized utility closely tracking baselines with modest perplexity increases, while unauthorized cases yield near-zero utility and stable blocking, demonstrated across selectivity, nonce invariance, and throughput benchmarks. The approach emphasizes practical usability and integration with LoRA, offering a scalable, model-agnostic way to prevent unauthorized use with clear limitations and operational considerations.
Abstract
We present a simple, PEFT-compatible mechanism that enforces secret-key access control in instruction-tuned language models. K-OTG trains on a dual-path corpus: authorized examples (prefixed with a role key) learn the task output, while unauthorized examples learn a visible block token. At inference, a pre-lm_head hook applies an orthonormal transform to the hidden state: with the correct key/role the inverse map restores the model's native basis; otherwise a session-ephemeral scrambler (permutation, sign flips, Householders) makes logits uninformative and the system short-circuits to BLOCK. Keys are not added as special tokens, and the method composes cleanly with LoRA on 4-bit bases. We evaluate an hour-scale protocol on 1-3B-class instruction models (Llama 3.2, Qwen2.5 1.5B) across utility (XSum ROUGE/BLEU, GSM8K accuracy, WikiText-2 perplexity), selectivity (3by3 role-key unlock matrices), nonce invariance, block suppression, and throughput. Authorized utility remains close to the base on summarization with the expected modest PPL increase from instruction tuning; unauthorized utility collapses (near-zero sequence metrics with exploding PPL), indicating practical unusability without the key. Unlock matrices are diagonally dominant (high on-target unlock, low cross-unlock), authorized block emission is 0 per N via robust bad-word lists, and greedy outputs match exactly across nonces, confirming correct inverse cancellation. The runtime overhead of the Python-level hook is 40% tokens per sec versus the base. K-OTG therefore provides a pragmatic, model-agnostic way to prevent unauthorized use while preserving authorized utility.
