Table of Contents
Fetching ...

LAYA: Layer-wise Attention Aggregation for Interpretable Depth-Aware Neural Networks

Gennaro Vessio

TL;DR

The paper addresses the limitation of predicting from only the last hidden representation by introducing LAYA, a depth-aware output head that aggregates all layer representations with input-conditioned attention. It defines $h_{ ext{agg}} = \sum_{i=1}^{L} \alpha_i(x)\, g_i(h_i)$, where $\alpha_i(x)$ are computed from a small scoring network and a temperature-scaled softmax, and the final prediction uses $\hat{y} = \phi(W h_{ ext{agg}} + b)$; adapters $g_i$ map each layer to a common space for fair weighting. Across vision and language benchmarks, LAYA matches or slightly improves accuracy (up to about 1 percentage point) over standard heads while providing intrinsic interpretability through per-input layer-attribution signals. The layer-attention profiles reveal task- and class-specific depth usage, enabling insights into depth specialization and potential implications for early-exit, model compression, and diagnostic tools, all without modifying the backbone. Overall, treating the output stage as a depth-aware aggregator offers a simple, architecture-agnostic enhancement that yields both performance benefits and transparent explanations derived from the model’s own computation.

Abstract

Deep neural networks typically rely on the representation produced by their final hidden layer to make predictions, implicitly assuming that this single vector fully captures the semantics encoded across all preceding transformations. However, intermediate layers contain rich and complementary information -- ranging from low-level patterns to high-level abstractions -- that is often discarded when the decision head depends solely on the last representation. This paper revisits the role of the output layer and introduces LAYA (Layer-wise Attention Aggregator), a novel output head that dynamically aggregates internal representations through attention. Instead of projecting only the deepest embedding, LAYA learns input-conditioned attention weights over layer-wise features, yielding an interpretable and architecture-agnostic mechanism for synthesizing predictions. Experiments on vision and language benchmarks show that LAYA consistently matches or improves the performance of standard output heads, with relative gains of up to about one percentage point in accuracy, while providing explicit layer-attribution scores that reveal how different abstraction levels contribute to each decision. Crucially, these interpretability signals emerge directly from the model's computation, without any external post hoc explanations. The code to reproduce LAYA is publicly available at: https://github.com/gvessio/LAYA.

LAYA: Layer-wise Attention Aggregation for Interpretable Depth-Aware Neural Networks

TL;DR

The paper addresses the limitation of predicting from only the last hidden representation by introducing LAYA, a depth-aware output head that aggregates all layer representations with input-conditioned attention. It defines , where are computed from a small scoring network and a temperature-scaled softmax, and the final prediction uses ; adapters map each layer to a common space for fair weighting. Across vision and language benchmarks, LAYA matches or slightly improves accuracy (up to about 1 percentage point) over standard heads while providing intrinsic interpretability through per-input layer-attribution signals. The layer-attention profiles reveal task- and class-specific depth usage, enabling insights into depth specialization and potential implications for early-exit, model compression, and diagnostic tools, all without modifying the backbone. Overall, treating the output stage as a depth-aware aggregator offers a simple, architecture-agnostic enhancement that yields both performance benefits and transparent explanations derived from the model’s own computation.

Abstract

Deep neural networks typically rely on the representation produced by their final hidden layer to make predictions, implicitly assuming that this single vector fully captures the semantics encoded across all preceding transformations. However, intermediate layers contain rich and complementary information -- ranging from low-level patterns to high-level abstractions -- that is often discarded when the decision head depends solely on the last representation. This paper revisits the role of the output layer and introduces LAYA (Layer-wise Attention Aggregator), a novel output head that dynamically aggregates internal representations through attention. Instead of projecting only the deepest embedding, LAYA learns input-conditioned attention weights over layer-wise features, yielding an interpretable and architecture-agnostic mechanism for synthesizing predictions. Experiments on vision and language benchmarks show that LAYA consistently matches or improves the performance of standard output heads, with relative gains of up to about one percentage point in accuracy, while providing explicit layer-attribution scores that reveal how different abstraction levels contribute to each decision. Crucially, these interpretability signals emerge directly from the model's computation, without any external post hoc explanations. The code to reproduce LAYA is publicly available at: https://github.com/gvessio/LAYA.

Paper Structure

This paper contains 25 sections, 14 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Global layer-wise attention statistics for LAYA (mean $\pm$ standard deviation over the full test set). LAYA consistently develops task-specific depth preferences: strong bias toward deep layers for CIFAR-10, softer depth usage for Fashion-MNIST, and nearly symmetric contributions in the IMDB text model.
  • Figure 2: Class-wise mean attention profiles of LAYA across datasets. Rows correspond to Fashion-MNIST (top), CIFAR-10 (middle), and IMDB (bottom), while columns correspond to all test samples (left), correctly classified samples (center), and misclassified samples (right). Each cell reports the average attention $\bar{\alpha}_{c,i}$ assigned to layer $i$ for class $c$.
  • Figure 3: Layer-wise attention statistics for the Best Artworks of All Time dataset (mean $\pm$ standard deviation over the full test set). The attention distribution reveals heterogeneous depth contributions, with notable peaks around intermediate layers and sparse activations elsewhere.
  • Figure 4: Class-wise mean attention profiles of LAYA on the Best Artworks of All Time dataset. Columns correspond to all test samples (left), correctly classified samples (center), and misclassified samples (right). Each cell reports the average attention $\bar{\alpha}_{c,i}$ assigned to layer $i$ for class (style) $c$.