Adaptive Integrated Layered Attention (AILA)
William Claster, Suhas KM, Dhairya Gundechia
TL;DR
Adaptive Integrated Layered Attention (AILA) introduces flexible cross-layer connections that adaptively reuse features from all preceding layers via dense skip connections and attention mechanisms. It presents two instantiations: Architecture 1 uses linear integration with multi-head attention, while Architecture 2 employs Transformer-style attention to select prior layer outputs. Across finance time-series forecasting, CIFAR-10 image recognition, and IMDB sentiment analysis, AILA achieves competitive performance with notable gains in efficiency, though vision benchmarks favor specialized architectures; language tasks particularly benefit from the explicit query–key interactions. Ablation studies confirm the value of adaptive inter-layer weighting, with depth and robustness analyses guiding architectural choices. The work opens avenues for multi-task learning, integration with pre-trained models, and interpretability of cross-layer attention, potentially advancing scalable and flexible deep networks.
Abstract
We propose Adaptive Integrated Layered Attention (AILA), a neural network architecture that combines dense skip connections with different mechanisms for adaptive feature reuse across network layers. We evaluate AILA on three challenging tasks: price forecasting for various commodities and indices (S&P 500, Gold, US dollar Futures, Coffee, Wheat), image recognition using the CIFAR-10 dataset, and sentiment analysis on the IMDB movie review dataset. In all cases, AILA matches strong deep learning baselines (LSTMs, Transformers, and ResNets), achieving it at a fraction of the training and inference time. Notably, we implement and test two versions of the model - AILA-Architecture 1, which uses simple linear layers as the connection mechanism between layers, and AILA-Architecture 2, which implements an attention mechanism to selectively focus on outputs from previous layers. Both architectures are applied in a single-task learning setting, with each model trained separately for individual tasks. Results confirm that AILA's adaptive inter-layer connections yield robust gains by flexibly reusing pertinent features at multiple network depths. The AILA approach thus presents an extension to existing architectures, improving long-range sequence modeling, image recognition with optimised computational speed, and SOTA classification performance in practice.
