Table of Contents
Fetching ...

Non-Interfering Weight Fields: Treating Model Parameters as a Continuously Extensible Function

Sarim Chaudhry

TL;DR

Non-Interfering Weight Fields (NIWF) is proposed, a framework that replaces the fixed weight paradigm with a learned function that generates weight configurations on demand from a continuous capability coordinate space and introduces the notion of software-like versioning for neural network intelligence.

Abstract

Large language models store all learned knowledge in a single, fixed weight vector. Teaching a model new capabilities requires modifying those same weights, inevitably degrading previously acquired knowledge. This fundamental limitation, known as catastrophic forgetting, has resisted principled solutions for decades. Existing approaches treat weights as immutable artifacts that must be protected through techniques like regularization heuristics, replay buffers, or isolated adapter modules. The problem is none of these provide a structural guarantee against forgetting. In this work, we propose Non-Interfering Weight Fields (NIWF), a framework that replaces the fixed weight paradigm with a learned function that generates weight configurations on demand from a continuous capability coordinate space. After training on a task, we commit the occupied coordinate region by snapshotting the fields outputs on anchor points to enforce a functional lock during all future training. We validate NIWF on sequential instructionfollowing and code generation tasks using Mistral-7B, demonstrating zero forgetting on committed tasks with competitive perplexity on new tasks. The framework introduces the notion of software-like versioning for neural network intelligence, where capabilities can be committed, extended, composed, and rolled back without retraining.

Non-Interfering Weight Fields: Treating Model Parameters as a Continuously Extensible Function

TL;DR

Non-Interfering Weight Fields (NIWF) is proposed, a framework that replaces the fixed weight paradigm with a learned function that generates weight configurations on demand from a continuous capability coordinate space and introduces the notion of software-like versioning for neural network intelligence.

Abstract

Large language models store all learned knowledge in a single, fixed weight vector. Teaching a model new capabilities requires modifying those same weights, inevitably degrading previously acquired knowledge. This fundamental limitation, known as catastrophic forgetting, has resisted principled solutions for decades. Existing approaches treat weights as immutable artifacts that must be protected through techniques like regularization heuristics, replay buffers, or isolated adapter modules. The problem is none of these provide a structural guarantee against forgetting. In this work, we propose Non-Interfering Weight Fields (NIWF), a framework that replaces the fixed weight paradigm with a learned function that generates weight configurations on demand from a continuous capability coordinate space. After training on a task, we commit the occupied coordinate region by snapshotting the fields outputs on anchor points to enforce a functional lock during all future training. We validate NIWF on sequential instructionfollowing and code generation tasks using Mistral-7B, demonstrating zero forgetting on committed tasks with competitive perplexity on new tasks. The framework introduces the notion of software-like versioning for neural network intelligence, where capabilities can be committed, extended, composed, and rolled back without retraining.
Paper Structure (34 sections, 13 equations, 13 figures, 3 tables)

This paper contains 34 sections, 13 equations, 13 figures, 3 tables.

Figures (13)

  • Figure 1: The NIWF architecture. A frozen backbone processes tokens through a two-pass forward. The coordinate dynamics module produces a capability coordinate $z$ from mean-pooled hidden states. The weight field maps $z$ to sparse gating over low-rank adapter bases. After training, coordinate regions are committed and functionally locked via anchor-based snapshot constraints.
  • Figure 2: Learning rate schedule for both training stages. Each task uses independent linear warmup over 5 percent of steps followed by cosine decay to zero. The peak learning rate is 2e-4.
  • Figure 3: Memory and scaling analysis. Left, GPU memory footprint comparison showing NIWF fits within 24 GB. Center, stored versus active parameter scaling as base count grows. Right, sequence-level gating provides 500x memory savings over token-level gating at sequence length 2048.
  • Figure 4: Training loss across sequential tasks. Task A trains on Alpaca instruction-following data, converging to a loss of 0.77. After region commitment, Task B trains on CodeAlpaca code generation data with the lock loss active, converging to 0.95. The lock constraint does not impede Task B learning.
  • Figure 5: Training perplexity for both tasks. Task A achieves a validation perplexity of 2.49 at convergence. Task B converges to a perplexity of approximately 2.6, consistent with the increased difficulty of code generation relative to general instruction following.
  • ...and 8 more figures