Attention Saturation and Gradient Suppression at Inflection Layers: Diagnosing and Mitigating Bottlenecks in Transformer Adaptation
Wang Zixian
TL;DR
This work addresses why pre-trained transformers often struggle to adapt to target domains, attributing it to gradient suppression caused by output saturation in lower layers, which biases adaptation toward high-level feature recombination. It introduces a diagnostic framework built on layer-wise observables—attention entropy, activation-gradient norm, parameter-gradient norm, and ΔCKA under a shared PCA basis—to identify inflection layers where gradient flow is most suppressed. A diagnose-first strategy combines automatic inflection-layer localization (via SKI) with selective LoRA injection in the identified band, enabling restoration of backward signals with minimal parameter overhead. Empirical evaluation on a BERT-base transfer task (SST-2 to Rotten Tomatoes) under UNDER/OVER regimes shows that selective LoRA at inflection layers yields the best accuracy (≈91.59%) with ~0.3M parameters, outperforming uniform LoRA and full or shallow unfreezing; results support a high-level composition vs low-level reconstruction dichotomy and demonstrate a practical, reproducible pipeline for efficient transfer across domains. The approach offers a general-purpose, diagnosis-driven pathway for targeted parameter-efficient fine-tuning and motivates extensions to other modalities and architectures.
Abstract
Pre-trained Transformers often exhibit over-confidence in source patterns and difficulty in forming new target-domain patterns during fine-tuning. We formalize the mechanism of output saturation leading to gradient suppression through standard cross-entropy and softmax analysis, showing that gradient suppression at inflection layers confines adaptation to high-level recombination of existing features while preventing low-level reconstruction. We introduce a set of layer-wise diagnostic metrics -- attention entropy (saturation proxy), activation gradient norm, parameter gradient norm, and Delta-CKA under a shared PCA basis -- to identify inflection layers characterized by both low attention entropy and steep gradient decay. Building on these findings, we propose a diagnose-first, inject-light fine-tuning strategy: selectively inserting LoRA adapters at inflection layers to restore suppressed backward signals with minimal parameter overhead. Experiments on BERT-base transfer from SST-2 to Rotten Tomatoes under under-trained and over-trained source regimes reveal that over-trained initialization benefits from inflection-layer LoRA injection, while under-trained initialization suffers performance degradation. When base features are strong, unblocking inflection layers facilitates high-level compositional adaptation; when base features are weak, full-pathway unblocking is required for low-level reconstruction, as supported by joint analysis of layer-wise activation gradients and Delta-CKA dynamics.
