Superposition in Transformers: A Novel Way of Building Mixture of Experts

Ayoub Ben Chaliah; Hela Dellagi

Superposition in Transformers: A Novel Way of Building Mixture of Experts

Ayoub Ben Chaliah, Hela Dellagi

TL;DR

Catastrophic forgetting during fine-tuning of large language models is mitigated by Superposition in Transformers, which merges a base model $M_{\text{base}}$ and a fine-tuned model $M_{\text{fine}}$ into a single merged model using layer-wise B-spline blending with coefficients $\alpha(l)$ and autoencoders that reconstruct hidden states. By freezing both experts and training only the blending coefficients, autoencoders, and related biases, the method preserves existing capabilities while adding compact domain-specific representations that can be selectively activated. The key contributions are (i) autoencoder-based reconstruction enabling in-model superposition, (ii) jointly learned B-spline blending, and (iii) parameter efficiency with minimal overhead, plus an optional 2D-$\alpha$ extension and dynamic switching. The approach shows promising results in reducing forgetting, improving cross-domain perplexity and alignment of internal representations, and enabling future multilingual, symbolic-reasoning, and multi-domain integration with limited parameter growth.

Abstract

Catastrophic forgetting remains a major challenge when adapting large language models (LLMs) to new tasks or domains. Conventional fine-tuning often overwrites existing knowledge, causing performance degradation on original tasks. We introduce Superposition in Transformers, a novel architecture that leverages autoencoders to superimpose the hidden representations of a base model and a fine-tuned model within a shared parameter space. By using B-spline-based blending coefficients and autoencoders that adaptively reconstruct hidden states based on the input data distribution, our method effectively mitigates catastrophic forgetting and enables a new paradigm of "in-model" superposition. This approach preserves original model capabilities while allowing compact domain-specific expertise to be added, and it supports dynamic switching between model states during inference.

Superposition in Transformers: A Novel Way of Building Mixture of Experts

TL;DR

Catastrophic forgetting during fine-tuning of large language models is mitigated by Superposition in Transformers, which merges a base model

and a fine-tuned model

into a single merged model using layer-wise B-spline blending with coefficients

and autoencoders that reconstruct hidden states. By freezing both experts and training only the blending coefficients, autoencoders, and related biases, the method preserves existing capabilities while adding compact domain-specific representations that can be selectively activated. The key contributions are (i) autoencoder-based reconstruction enabling in-model superposition, (ii) jointly learned B-spline blending, and (iii) parameter efficiency with minimal overhead, plus an optional 2D-

extension and dynamic switching. The approach shows promising results in reducing forgetting, improving cross-domain perplexity and alignment of internal representations, and enabling future multilingual, symbolic-reasoning, and multi-domain integration with limited parameter growth.

Abstract

Paper Structure (45 sections, 9 equations, 8 figures, 2 tables, 2 algorithms)

This paper contains 45 sections, 9 equations, 8 figures, 2 tables, 2 algorithms.

Introduction
Background and Related Work
Proposed Method
Overview
Blending Model Weights Using B-Splines
Motivation
Formulation
Merged Model Architecture
Merging Models Post-Training
Forward Pass with Merged Model
Autoencoders for State Reconstruction
Architecture
Minimizing Information Loss
Role of Autoencoders in Encouraging Polysemanticity
Extending to a 2D-alpha Model (Optional)
...and 30 more sections

Figures (8)

Figure 1: Overview of a GPT-2 Merged Model Architecture.
Figure 2: Perplexity evolution across epochs for different merging methods.
Figure 3: t-SNE visualization of layer 4 hidden states from the merged model and expert models for English and French inputs (2D-alpha).
Figure 4: Comparison of average neuron diversity across layers for the base, fine-tuned, and merged models.
Figure 5: Comparison of mean neuron activation across layers for the base, fine-tuned, and merged models.
...and 3 more figures

Superposition in Transformers: A Novel Way of Building Mixture of Experts

TL;DR

Abstract

Superposition in Transformers: A Novel Way of Building Mixture of Experts

Authors

TL;DR

Abstract

Table of Contents

Figures (8)