Layer-wise LoRA fine-tuning: a similarity metric approach
Keith Ando Ogawa, Bruno Lopes Yamamoto, Lucas Lauton de Alcantara, Lucas Pellicer, Rosimeire Pereira Costa, Edson Bollis, Anna Helena Reali Costa, Artur Jordao
TL;DR
This paper tackles the high cost of fine-tuning large language models by introducing a layer-wise selection strategy for LoRA-based fine-tuning that uses a representation-similarity metric to identify the most impactful transformer layers. The method defines a layer importance score using representation dissimilarity and selects the top-N layers to update, enabling substantial reductions in trainable parameters while preserving or improving task performance. Empirical results across encoder-only, decoder-only, and multimodal models show about a 50% reduction in trainable parameters with minimal or positive performance changes on GLUE, math/coding tasks, and ScienceQA, along with meaningful training-time and memory savings. The approach is orthogonal to existing PEFT methods and can be combined with LoRA variants to further improve efficiency, suggesting a practical path toward scalable, parameter-efficient fine-tuning for large models.
Abstract
Pre-training Large Language Models (LLMs) on web-scale datasets becomes fundamental for advancing general-purpose AI. In contrast, enhancing their predictive performance on downstream tasks typically involves adapting their knowledge through fine-tuning. Parameter-efficient fine-tuning techniques, such as Low-Rank Adaptation (LoRA), aim to reduce the computational cost of this process by freezing the pre-trained model and updating a smaller number of parameters. In comparison to full fine-tuning, these methods achieve over 99\% reduction in trainable parameter count, depending on the configuration. Unfortunately, such a reduction may prove insufficient as LLMs continue to grow in scale. In this work, we address the previous problem by systematically selecting only a few layers to fine-tune using LoRA or its variants. We argue that not all layers contribute equally to the model adaptation. Leveraging this, we identify the most relevant layers to fine-tune by measuring their contribution to changes in internal representations. Our method is orthogonal to and readily compatible with existing low-rank adaptation techniques. We reduce the trainable parameters in LoRA-based techniques by up to 50\%, while maintaining the predictive performance across different models and tasks. Specifically, on encoder-only architectures, this reduction in trainable parameters leads to a negligible predictive performance drop on the GLUE benchmark. On decoder-only architectures, we achieve a small drop or even improvements in the predictive performance on mathematical problem-solving capabilities and coding tasks. Finally, this effectiveness extends to multimodal models, for which we also observe competitive results relative to fine-tuning with LoRA modules in all layers. Code is available at: https://github.com/c2d-usp/Layer-wise-LoRA-with-CKA
