KS-Lottery: Finding Certified Lottery Tickets for Multilingual Language Models
Fei Yuan, Chang Ma, Shuai Yuan, Qiushi Sun, Lei Li
TL;DR
The paper addresses the problem of achieving full fine-tuning performance with ultra-small subnetworks in multilingual LLMs. It introduces KS-Lottery, which uses the Kolmogorov-Smirnov Test to detect distribution shifts in embedding parameters during fine-tuning and identifies certifiable winning tickets within the embedding layer. The method demonstrates that as few as 18 token embeddings can deliver translation quality comparable to full fine-tuning on multilingual benchmarks, and provides a theoretical certification guaranteeing performance under defined distribution-distance conditions. Empirically, KS-Lottery outperforms several parameter-efficient tuning approaches in terms of parameter efficiency and interpretability, across bilingual translation tasks and LLaMA-7B, with strong generalization to Partial Tuning and Partial Transfer scenarios. This work offers a principled, certifiable pathway to efficient multilingual transfer with potential broad applicability beyond translation tasks.
Abstract
The lottery ticket hypothesis posits the existence of ``winning tickets'' within a randomly initialized neural network. Do winning tickets exist for LLMs in fine-tuning scenarios? How can we find such winning tickets? In this paper, we propose KS-Lottery, a method to identify a small subset of LLM parameters highly effective in multilingual fine-tuning. Our key idea is to use Kolmogorov-Smirnov Test to analyze the distribution shift of parameters before and after fine-tuning. We further theoretically prove that KS-Lottery can find the certified winning tickets in the embedding layer, fine-tuning on the found parameters is guaranteed to perform as well as full fine-tuning. Comparing KS-Lottery with other parameter-efficient tuning algorithms on translation tasks, the experimental results show that KS-Lottery finds a much smaller set of parameters for fine-tuning while achieving the comparable performance as full fine-tuning LLM. Surprisingly, we find that fine-tuning 18 tokens' embedding of LLaMA suffices to reach the fine-tuning translation performance~\footnote{https://github.com/CONE-MT/KS-Lottery.}.
