Layer-Aware Embedding Fusion for LLMs in Text Classifications
Jiho Gwak, Yuchul Jung
TL;DR
This paper tackles how to effectively use embeddings from decoder-based LLMs for text classification by introducing layer-aware embedding selection and multi-model fusion without fine-tuning. It systematically evaluates which layers carry the most discriminative information and demonstrates that optimal layers vary across datasets, with mid-to-late layers often outperforming the final layer. The study further shows that fusing embeddings from multiple models can yield gains when the models are complementary, though memory and compute costs rise with more models and layers. Overall, the work provides practical guidelines for designing scalable, task- and dataset-aware embedding fusion systems in real-world NLP tasks.
Abstract
Embedding fusion has emerged as an effective approach for enhancing performance across various NLP tasks. However, systematic guidelines for selecting optimal layers and developing effective fusion strategies for the integration of LLMs remain underexplored. In this study, we propose a layer-aware embedding selection method and investigate how to quantitatively evaluate different layers to identify the most important ones for downstream NLP tasks, showing that the critical layers vary depending on the dataset. We also explore how combining embeddings from multiple LLMs, without requiring model fine-tuning, can improve performance. Experiments on four English text classification datasets (SST-2, MR, R8, and R52) demonstrate that different layers in LLMs exhibit varying degrees of representational strength for classification, and that combining embeddings from different models can enhance performance if the models exhibit complementary characteristics. Additionally, we discuss resources overhead (memory and inference time) to provide a balanced perspective on the real world feasibility of embedding fusion. Future work will explore multilingual and domain specific datasets, as well as techniques for automating layer selection, to improve both performance and scalability.
