Table of Contents
Fetching ...

SHINE: A Scalable In-Context Hypernetwork for Mapping Context to LoRA in a Single Pass

Yewei Liu, Xiyuan Wang, Yansheng Mao, Yoav Gelbery, Haggai Maron, Muhan Zhang

TL;DR

SHINE introduces a scalable in-context hypernetwork that maps meaningful contexts to LoRA adapters for LLMs in a single forward pass, avoiding gradient-based fine-tuning. It leverages the LLM backbone to encode context via memory extraction and uses an M2P Transformer to generate layer-wise LoRAs, enabling rapid adaptation. The training pipeline combines self-supervised pretraining with reconstruction and completion tasks and instruction fine-tuning on QA data, scaling to large datasets. Experiments show SHINE achieves competitive QA performance with In-Context prompting while dramatically reducing training overhead and demonstrating favorable scaling across backbone and hypernetwork sizes.

Abstract

We propose SHINE (Scalable Hyper In-context NEtwork), a scalable hypernetwork that can map diverse meaningful contexts into high-quality LoRA adapters for large language models (LLM). By reusing the frozen LLM's own parameters in an in-context hypernetwork design and introducing architectural innovations, SHINE overcomes key limitations of prior hypernetworks and achieves strong expressive power with a relatively small number of parameters. We introduce a pretraining and instruction fine-tuning pipeline, and train our hypernetwork to generate high quality LoRA adapters from diverse meaningful contexts in a single forward pass. It updates LLM parameters without any fine-tuning, and immediately enables complex question answering tasks related to the context without directly accessing the context, effectively transforming in-context knowledge to in-parameter knowledge in one pass. Our work achieves outstanding results on various tasks, greatly saves time, computation and memory costs compared to SFT-based LLM adaptation, and shows great potential for scaling. Our code is available at https://github.com/Yewei-Liu/SHINE

SHINE: A Scalable In-Context Hypernetwork for Mapping Context to LoRA in a Single Pass

TL;DR

SHINE introduces a scalable in-context hypernetwork that maps meaningful contexts to LoRA adapters for LLMs in a single forward pass, avoiding gradient-based fine-tuning. It leverages the LLM backbone to encode context via memory extraction and uses an M2P Transformer to generate layer-wise LoRAs, enabling rapid adaptation. The training pipeline combines self-supervised pretraining with reconstruction and completion tasks and instruction fine-tuning on QA data, scaling to large datasets. Experiments show SHINE achieves competitive QA performance with In-Context prompting while dramatically reducing training overhead and demonstrating favorable scaling across backbone and hypernetwork sizes.

Abstract

We propose SHINE (Scalable Hyper In-context NEtwork), a scalable hypernetwork that can map diverse meaningful contexts into high-quality LoRA adapters for large language models (LLM). By reusing the frozen LLM's own parameters in an in-context hypernetwork design and introducing architectural innovations, SHINE overcomes key limitations of prior hypernetworks and achieves strong expressive power with a relatively small number of parameters. We introduce a pretraining and instruction fine-tuning pipeline, and train our hypernetwork to generate high quality LoRA adapters from diverse meaningful contexts in a single forward pass. It updates LLM parameters without any fine-tuning, and immediately enables complex question answering tasks related to the context without directly accessing the context, effectively transforming in-context knowledge to in-parameter knowledge in one pass. Our work achieves outstanding results on various tasks, greatly saves time, computation and memory costs compared to SFT-based LLM adaptation, and shows great potential for scaling. Our code is available at https://github.com/Yewei-Liu/SHINE
Paper Structure (50 sections, 50 equations, 12 figures, 6 tables)

This paper contains 50 sections, 50 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: An Example of SHINE: It maps context to LoRA in a single pass without any fine-tuning. The LoRA can be used for downstream conversation without accessing the context.
  • Figure 2: Overall Architecture. The process consists of two passes: (1) Memory Extraction, where the LLM (augmented with Meta LoRA) processes context to produce memory states, and (2) Parameter Generation, where a hypernetwork converts these states into task-specific LoRA adapters for the final inference.
  • Figure 3: Hypernetwork Architecture. The model uses alternating attention along layer and token axes to efficiently process the memory tensor before projecting it into weights.
  • Figure 4: Reconstruction Task: The hypernetwork encodes the full context into a LoRA. The LLM is then prompted to reconstruct the original text.
  • Figure 5: Pretraining Results: Reconstruction and completion loss/perplexity across varying context lengths. P10/P90 denote 10% quantile/90% quantile.
  • ...and 7 more figures