Table of Contents
Fetching ...

Improving LLM Predictions via Inter-Layer Structural Encoders

Tom Ulanovski, Eyal Blyachman, Maya Bechler-Speicher

Abstract

The standard practice in Large Language Models (LLMs) is to base predictions on the final-layer token representations. Recent studies, however, show that intermediate layers encode substantial information, which may contain more task-relevant features than the final-layer representations alone. Importantly, it was shown that for different tasks, different layers may be optimal. In this work we introduce Inter-Layer Structural Encoders (ILSE), a powerful structural approach to learn one effective representation from the LLM's internal layer representations all together. Central to ILSE is Cayley-Encoder, a mathematically grounded geometric encoder that leverages expander Cayley graphs for efficient inter-layer information propagation. We evaluate ILSE across 13 classification and semantic similarity tasks with 9 pre-trained LLMs ranging from 14 million to 8 billion parameters. ILSE consistently outperforms baselines and existing approaches, achieving up to 44% improvement in accuracy and 25% in similarity metrics. We further show that ILSE is data-efficient in few-shot regimes and can make small LLMs competitive with substantially larger models.

Improving LLM Predictions via Inter-Layer Structural Encoders

Abstract

The standard practice in Large Language Models (LLMs) is to base predictions on the final-layer token representations. Recent studies, however, show that intermediate layers encode substantial information, which may contain more task-relevant features than the final-layer representations alone. Importantly, it was shown that for different tasks, different layers may be optimal. In this work we introduce Inter-Layer Structural Encoders (ILSE), a powerful structural approach to learn one effective representation from the LLM's internal layer representations all together. Central to ILSE is Cayley-Encoder, a mathematically grounded geometric encoder that leverages expander Cayley graphs for efficient inter-layer information propagation. We evaluate ILSE across 13 classification and semantic similarity tasks with 9 pre-trained LLMs ranging from 14 million to 8 billion parameters. ILSE consistently outperforms baselines and existing approaches, achieving up to 44% improvement in accuracy and 25% in similarity metrics. We further show that ILSE is data-efficient in few-shot regimes and can make small LLMs competitive with substantially larger models.
Paper Structure (19 sections, 4 equations, 3 figures, 6 tables)

This paper contains 19 sections, 4 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: An overview of ILSE with Clayey-Encoder. Layer representations are mapped into nodes in a Cayley Graph, which is then fed into a GNN to learn the final inter-layer representation.
  • Figure 2: LLM size analysis. Performance across the Pythia model suite (14M to 2.8B parameters). ILSE consistently outperforms other baselines across all model sizes.
  • Figure 3: Few-Shot Learning Analysis. Performance across 1-1024 samples per label using Pythia-410M. ILSE outperforms baselines with 32 samples per label across all tasks.