Table of Contents
Fetching ...

GLAI: GreenLightningAI for Accelerated Training through Knowledge Decoupling

Jose I. Mestre, Alberto Fernández-Hernández, Cristian Pérez-Corral, Manuel F. Dolz, Jose Duato, Enrique S. Quintana-Ortí

TL;DR

GLAI proposes to decouple structural knowledge, encoded by activation patterns, from quantitative knowledge, encoded by path weights, in ReLU MLPs. By stabilizing the activation-structure early and freezing it, GLAI rewrites the network as a linear estimator over active paths, preserving expressive power while accelerating training. Across diverse tasks with frozen backbones—classification, self-supervised projection, and few-shot learning—it achieves accuracy on par with or better than traditional MLP heads while significantly reducing training time. This work suggests a general design principle for efficient feedforward components and points to future integration into large-scale architectures such as Transformers.

Abstract

In this work we introduce GreenLightningAI (GLAI), a new architectural block designed as an alternative to conventional MLPs. The central idea is to separate two types of knowledge that are usually entangled during training: (i) *structural knowledge*, encoded by the stable activation patterns induced by ReLU activations; and (ii) *quantitative knowledge*, carried by the numerical weights and biases. By fixing the structure once stabilized, GLAI reformulates the MLP as a combination of paths, where only the quantitative component is optimized. This reformulation retains the universal approximation capabilities of MLPs, yet achieves a more efficient training process, reducing training time by ~40% on average across the cases examined in this study. Crucially, GLAI is not just another classifier, but a generic block that can replace MLPs wherever they are used, from supervised heads with frozen backbones to projection layers in self-supervised learning or few-shot classifiers. Across diverse experimental setups, GLAI consistently matches or exceeds the accuracy of MLPs with an equivalent number of parameters, while converging faster. Overall, GLAI establishes a new design principle that opens a direction for future integration into large-scale architectures such as Transformers, where MLP blocks dominate the computational footprint.

GLAI: GreenLightningAI for Accelerated Training through Knowledge Decoupling

TL;DR

GLAI proposes to decouple structural knowledge, encoded by activation patterns, from quantitative knowledge, encoded by path weights, in ReLU MLPs. By stabilizing the activation-structure early and freezing it, GLAI rewrites the network as a linear estimator over active paths, preserving expressive power while accelerating training. Across diverse tasks with frozen backbones—classification, self-supervised projection, and few-shot learning—it achieves accuracy on par with or better than traditional MLP heads while significantly reducing training time. This work suggests a general design principle for efficient feedforward components and points to future integration into large-scale architectures such as Transformers.

Abstract

In this work we introduce GreenLightningAI (GLAI), a new architectural block designed as an alternative to conventional MLPs. The central idea is to separate two types of knowledge that are usually entangled during training: (i) *structural knowledge*, encoded by the stable activation patterns induced by ReLU activations; and (ii) *quantitative knowledge*, carried by the numerical weights and biases. By fixing the structure once stabilized, GLAI reformulates the MLP as a combination of paths, where only the quantitative component is optimized. This reformulation retains the universal approximation capabilities of MLPs, yet achieves a more efficient training process, reducing training time by ~40% on average across the cases examined in this study. Crucially, GLAI is not just another classifier, but a generic block that can replace MLPs wherever they are used, from supervised heads with frozen backbones to projection layers in self-supervised learning or few-shot classifiers. Across diverse experimental setups, GLAI consistently matches or exceeds the accuracy of MLPs with an equivalent number of parameters, while converging faster. Overall, GLAI establishes a new design principle that opens a direction for future integration into large-scale architectures such as Transformers, where MLP blocks dominate the computational footprint.

Paper Structure

This paper contains 16 sections, 6 theorems, 35 equations, 3 figures, 3 tables.

Key Result

Proposition 1

Let $A=(A_1, \ldots, A_L)$ denote an activation pattern, and define the diagonal matrix $D_l = \textup{diag} (A_l)$, of size $n_l\times n_l$, where the diagonal elements are determined by the vector $A_l\in \{0,1\}^{n_l}$. Then, for every $x\in \mathbb{R}^{n_0}$ with activation pattern $A$, it holds

Figures (3)

  • Figure 1: Representation of a glai model for samples $x\in \mathbb{R}^4$, target values $f(x)\in \mathbb{R}^3$. In this example representation, there are 6 paths in total, distributed in a ratio of 2 paths per output coordinate.
  • Figure 2: Evolution of the path distance $m_t$ during training. The right $y$-axis corresponds to the relative error of $m_t$, while the left $y$-axis shows training and validation losses.
  • Figure 3: Maximum validation accuracy obtained by the original mlp and its glai counterparts for different reduction factors $\rho$.

Theorems & Definitions (22)

  • Definition 1
  • Remark 1
  • Definition 2
  • Proposition 1
  • Definition 3
  • Definition 4
  • Definition 5
  • Theorem 1
  • Definition 6
  • Remark 2
  • ...and 12 more