Non-Interfering Weight Fields: Treating Model Parameters as a Continuously Extensible Function

Sarim Chaudhry

Non-Interfering Weight Fields: Treating Model Parameters as a Continuously Extensible Function

Sarim Chaudhry

TL;DR

Non-Interfering Weight Fields (NIWF) is proposed, a framework that replaces the fixed weight paradigm with a learned function that generates weight configurations on demand from a continuous capability coordinate space and introduces the notion of software-like versioning for neural network intelligence.

Abstract

Large language models store all learned knowledge in a single, fixed weight vector. Teaching a model new capabilities requires modifying those same weights, inevitably degrading previously acquired knowledge. This fundamental limitation, known as catastrophic forgetting, has resisted principled solutions for decades. Existing approaches treat weights as immutable artifacts that must be protected through techniques like regularization heuristics, replay buffers, or isolated adapter modules. The problem is none of these provide a structural guarantee against forgetting. In this work, we propose Non-Interfering Weight Fields (NIWF), a framework that replaces the fixed weight paradigm with a learned function that generates weight configurations on demand from a continuous capability coordinate space. After training on a task, we commit the occupied coordinate region by snapshotting the fields outputs on anchor points to enforce a functional lock during all future training. We validate NIWF on sequential instructionfollowing and code generation tasks using Mistral-7B, demonstrating zero forgetting on committed tasks with competitive perplexity on new tasks. The framework introduces the notion of software-like versioning for neural network intelligence, where capabilities can be committed, extended, composed, and rolled back without retraining.

Non-Interfering Weight Fields: Treating Model Parameters as a Continuously Extensible Function

TL;DR

Abstract

Paper Structure (34 sections, 13 equations, 13 figures, 3 tables)

This paper contains 34 sections, 13 equations, 13 figures, 3 tables.

Introduction
Related Work
Catastrophic Forgetting and Continual Learning
Parameter-Efficient Fine-Tuning
Mixture-of-Experts
Hypernetworks
Method
Frozen Backbone with Conditional Adapter Banks
Weight Field
Coordinate Dynamics
Training Objective
Region Commitment and Functional Locking
Sequential Task Learning Protocol
Model Architecture
Adapter Bank Parameterization
...and 19 more sections

Figures (13)

Figure 1: The NIWF architecture. A frozen backbone processes tokens through a two-pass forward. The coordinate dynamics module produces a capability coordinate $z$ from mean-pooled hidden states. The weight field maps $z$ to sparse gating over low-rank adapter bases. After training, coordinate regions are committed and functionally locked via anchor-based snapshot constraints.
Figure 2: Learning rate schedule for both training stages. Each task uses independent linear warmup over 5 percent of steps followed by cosine decay to zero. The peak learning rate is 2e-4.
Figure 3: Memory and scaling analysis. Left, GPU memory footprint comparison showing NIWF fits within 24 GB. Center, stored versus active parameter scaling as base count grows. Right, sequence-level gating provides 500x memory savings over token-level gating at sequence length 2048.
Figure 4: Training loss across sequential tasks. Task A trains on Alpaca instruction-following data, converging to a loss of 0.77. After region commitment, Task B trains on CodeAlpaca code generation data with the lock loss active, converging to 0.95. The lock constraint does not impede Task B learning.
Figure 5: Training perplexity for both tasks. Task A achieves a validation perplexity of 2.49 at convergence. Task B converges to a perplexity of approximately 2.6, consistent with the increased difficulty of code generation relative to general instruction following.
...and 8 more figures

Non-Interfering Weight Fields: Treating Model Parameters as a Continuously Extensible Function

TL;DR

Abstract

Non-Interfering Weight Fields: Treating Model Parameters as a Continuously Extensible Function

Authors

TL;DR

Abstract

Table of Contents

Figures (13)