Table of Contents
Fetching ...

TiTok: Transfer Token-level Knowledge via Contrastive Excess to Transplant LoRA

Chanjoo Jung, Jaehyung Kim

TL;DR

TiTok addresses the challenge of transferring LoRA-based PEFT knowledge across heterogeneous backbones by introducing a token-level contrastive excess signal that identifies informative tokens within synthetic data generated by a source expert. It employs a two-stage filtering pipeline—sample filtering by mean excess and token-level selection by top k%—and uses a tokenizer-alignment mechanism to handle mismatched tokenizers, all without training extra discriminators. Across BBH, MMLU, and LaMP benchmarks, TiTok consistently surpasses Vanilla, KD, and TransLoRA baselines, achieving average gains up to around +8% and demonstrating robustness to cross-family and external-data transfer. The approach offers a practical, data-efficient pathway for deploying LoRA-based knowledge transfer in real-world, multi-model ecosystems, with potential extensions to adaptive token thresholds and broader data sources.

Abstract

Large Language Models (LLMs) are widely applied in real world scenarios, but fine-tuning them comes with significant computational and storage costs. Parameter-Efficient Fine-Tuning (PEFT) methods such as LoRA mitigate these costs, but the adapted parameters are dependent on the base model and cannot be transferred across different backbones. One way to address this issue is through knowledge distillation, but its effectiveness inherently depends on training data. Recent work such as TransLoRA avoids this by generating synthetic data, but this adds complexity because it requires training an additional discriminator model. In this paper, we propose TiTok, a new framework that enables effective LoRA Transplantation through Token-level knowledge transfer. Specifically, TiTok captures task-relevant information through a contrastive excess between a source model with and without LoRA. This excess highlights informative tokens and enables selective filtering of synthetic data, all without additional models or overhead. Through experiments on three benchmarks across multiple transfer settings, our experiments show that the proposed method is consistently effective, achieving average performance gains of +4~8% compared to baselines overall.

TiTok: Transfer Token-level Knowledge via Contrastive Excess to Transplant LoRA

TL;DR

TiTok addresses the challenge of transferring LoRA-based PEFT knowledge across heterogeneous backbones by introducing a token-level contrastive excess signal that identifies informative tokens within synthetic data generated by a source expert. It employs a two-stage filtering pipeline—sample filtering by mean excess and token-level selection by top k%—and uses a tokenizer-alignment mechanism to handle mismatched tokenizers, all without training extra discriminators. Across BBH, MMLU, and LaMP benchmarks, TiTok consistently surpasses Vanilla, KD, and TransLoRA baselines, achieving average gains up to around +8% and demonstrating robustness to cross-family and external-data transfer. The approach offers a practical, data-efficient pathway for deploying LoRA-based knowledge transfer in real-world, multi-model ecosystems, with potential extensions to adaptive token thresholds and broader data sources.

Abstract

Large Language Models (LLMs) are widely applied in real world scenarios, but fine-tuning them comes with significant computational and storage costs. Parameter-Efficient Fine-Tuning (PEFT) methods such as LoRA mitigate these costs, but the adapted parameters are dependent on the base model and cannot be transferred across different backbones. One way to address this issue is through knowledge distillation, but its effectiveness inherently depends on training data. Recent work such as TransLoRA avoids this by generating synthetic data, but this adds complexity because it requires training an additional discriminator model. In this paper, we propose TiTok, a new framework that enables effective LoRA Transplantation through Token-level knowledge transfer. Specifically, TiTok captures task-relevant information through a contrastive excess between a source model with and without LoRA. This excess highlights informative tokens and enables selective filtering of synthetic data, all without additional models or overhead. Through experiments on three benchmarks across multiple transfer settings, our experiments show that the proposed method is consistently effective, achieving average performance gains of +4~8% compared to baselines overall.

Paper Structure

This paper contains 38 sections, 7 equations, 4 figures, 12 tables, 1 algorithm.

Figures (4)

  • Figure 1: Overview of TiTok: Transplantation through Token-level knowledge transfer. Starting from a small set of seed prompts, the source expert model (source base model + LoRA) generates synthetic data. A contrastive excess filtering mechanism then compares the expert against its base backbone to compute token-level excess scores. Using these scores, TiTok first performs sample filtering and subsequently token selection, retaining only the most informative samples and tokens. When tokenizers differ, masks are aligned prior to training. The resulting filtered data is finally used to train a new LoRA on the target backbone, enabling efficient knowledge transfer.
  • Figure 2: Impact of query source. Across transfer settings, using the source expert model to synthesize query generally yields better performance. The accuracy averaged over all BBH tasks are reported.
  • Figure 3: Representative performance trends across $k$%. Among the three graphs, two come from the same task (BBH), while two share the same transfer setting (Llama2 7B $\to$ Llama3 8B).
  • Figure 4: Overview of TiTok's tokenizer alignment algorithm. The algorithm handles cases where the source and target models use different tokenizers. The binary mask scores assigned by the source model are averaged within aligned spans and propagated to target tokens, producing fractional scores that guide top-$k\%$ token selection for training the target model's LoRA adapter.