GLAI: GreenLightningAI for Accelerated Training through Knowledge Decoupling
Jose I. Mestre, Alberto Fernández-Hernández, Cristian Pérez-Corral, Manuel F. Dolz, Jose Duato, Enrique S. Quintana-Ortí
TL;DR
GLAI proposes to decouple structural knowledge, encoded by activation patterns, from quantitative knowledge, encoded by path weights, in ReLU MLPs. By stabilizing the activation-structure early and freezing it, GLAI rewrites the network as a linear estimator over active paths, preserving expressive power while accelerating training. Across diverse tasks with frozen backbones—classification, self-supervised projection, and few-shot learning—it achieves accuracy on par with or better than traditional MLP heads while significantly reducing training time. This work suggests a general design principle for efficient feedforward components and points to future integration into large-scale architectures such as Transformers.
Abstract
In this work we introduce GreenLightningAI (GLAI), a new architectural block designed as an alternative to conventional MLPs. The central idea is to separate two types of knowledge that are usually entangled during training: (i) *structural knowledge*, encoded by the stable activation patterns induced by ReLU activations; and (ii) *quantitative knowledge*, carried by the numerical weights and biases. By fixing the structure once stabilized, GLAI reformulates the MLP as a combination of paths, where only the quantitative component is optimized. This reformulation retains the universal approximation capabilities of MLPs, yet achieves a more efficient training process, reducing training time by ~40% on average across the cases examined in this study. Crucially, GLAI is not just another classifier, but a generic block that can replace MLPs wherever they are used, from supervised heads with frozen backbones to projection layers in self-supervised learning or few-shot classifiers. Across diverse experimental setups, GLAI consistently matches or exceeds the accuracy of MLPs with an equivalent number of parameters, while converging faster. Overall, GLAI establishes a new design principle that opens a direction for future integration into large-scale architectures such as Transformers, where MLP blocks dominate the computational footprint.
