LAYA: Layer-wise Attention Aggregation for Interpretable Depth-Aware Neural Networks

Gennaro Vessio

LAYA: Layer-wise Attention Aggregation for Interpretable Depth-Aware Neural Networks

Gennaro Vessio

TL;DR

The paper addresses the limitation of predicting from only the last hidden representation by introducing LAYA, a depth-aware output head that aggregates all layer representations with input-conditioned attention. It defines $h_{ ext{agg}} = \sum_{i=1}^{L} \alpha_i(x)\, g_i(h_i)$, where $\alpha_i(x)$ are computed from a small scoring network and a temperature-scaled softmax, and the final prediction uses $\hat{y} = \phi(W h_{ ext{agg}} + b)$; adapters $g_i$ map each layer to a common space for fair weighting. Across vision and language benchmarks, LAYA matches or slightly improves accuracy (up to about 1 percentage point) over standard heads while providing intrinsic interpretability through per-input layer-attribution signals. The layer-attention profiles reveal task- and class-specific depth usage, enabling insights into depth specialization and potential implications for early-exit, model compression, and diagnostic tools, all without modifying the backbone. Overall, treating the output stage as a depth-aware aggregator offers a simple, architecture-agnostic enhancement that yields both performance benefits and transparent explanations derived from the model’s own computation.

Abstract

Deep neural networks typically rely on the representation produced by their final hidden layer to make predictions, implicitly assuming that this single vector fully captures the semantics encoded across all preceding transformations. However, intermediate layers contain rich and complementary information -- ranging from low-level patterns to high-level abstractions -- that is often discarded when the decision head depends solely on the last representation. This paper revisits the role of the output layer and introduces LAYA (Layer-wise Attention Aggregator), a novel output head that dynamically aggregates internal representations through attention. Instead of projecting only the deepest embedding, LAYA learns input-conditioned attention weights over layer-wise features, yielding an interpretable and architecture-agnostic mechanism for synthesizing predictions. Experiments on vision and language benchmarks show that LAYA consistently matches or improves the performance of standard output heads, with relative gains of up to about one percentage point in accuracy, while providing explicit layer-attribution scores that reveal how different abstraction levels contribute to each decision. Crucially, these interpretability signals emerge directly from the model's computation, without any external post hoc explanations. The code to reproduce LAYA is publicly available at: https://github.com/gvessio/LAYA.

LAYA: Layer-wise Attention Aggregation for Interpretable Depth-Aware Neural Networks

TL;DR

Abstract

LAYA: Layer-wise Attention Aggregation for Interpretable Depth-Aware Neural Networks

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)