Superposition in Transformers: A Novel Way of Building Mixture of Experts
Ayoub Ben Chaliah, Hela Dellagi
TL;DR
Catastrophic forgetting during fine-tuning of large language models is mitigated by Superposition in Transformers, which merges a base model $M_{\text{base}}$ and a fine-tuned model $M_{\text{fine}}$ into a single merged model using layer-wise B-spline blending with coefficients $\alpha(l)$ and autoencoders that reconstruct hidden states. By freezing both experts and training only the blending coefficients, autoencoders, and related biases, the method preserves existing capabilities while adding compact domain-specific representations that can be selectively activated. The key contributions are (i) autoencoder-based reconstruction enabling in-model superposition, (ii) jointly learned B-spline blending, and (iii) parameter efficiency with minimal overhead, plus an optional 2D-$\alpha$ extension and dynamic switching. The approach shows promising results in reducing forgetting, improving cross-domain perplexity and alignment of internal representations, and enabling future multilingual, symbolic-reasoning, and multi-domain integration with limited parameter growth.
Abstract
Catastrophic forgetting remains a major challenge when adapting large language models (LLMs) to new tasks or domains. Conventional fine-tuning often overwrites existing knowledge, causing performance degradation on original tasks. We introduce Superposition in Transformers, a novel architecture that leverages autoencoders to superimpose the hidden representations of a base model and a fine-tuned model within a shared parameter space. By using B-spline-based blending coefficients and autoencoders that adaptively reconstruct hidden states based on the input data distribution, our method effectively mitigates catastrophic forgetting and enables a new paradigm of "in-model" superposition. This approach preserves original model capabilities while allowing compact domain-specific expertise to be added, and it supports dynamic switching between model states during inference.
