First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models

Chi Ma; Mincong Huang; Ying Zhang; Chao Wang; Yujie Wang; Lei Yu; Chuan Liu; Wei Lin

First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models

Chi Ma, Mincong Huang, Ying Zhang, Chao Wang, Yujie Wang, Lei Yu, Chuan Liu, Wei Lin

TL;DR

This work tackles the high inference cost of large language models by introducing a training-free Threshold-based Dynamic Activation (TDA) that exploits sequence-level sparsity to accelerate generation by 18-25% with minimal accuracy loss. TDA uses offline thresholding to generate per-layer activation masks from the prompt, then applies these masks during generation to skip underutilized neurons in the FFN, avoiding retraining or reliance on ReLU-specific dynamics. The authors provide a theoretical framework for dynamic activation, identifying history-related activation uncertainty and semantic-irrelevant activation inertia as key drivers, and validate TDA across multiple model families and tasks, showing competitive or improved performance relative to training-dependent and training-free baselines. The approach offers practical, deployment-friendly speedups for diverse LLM architectures and provides insights that can guide future research in efficient model design, including adaptive depth and prompt compression strategies.

Abstract

Dynamic activation (DA) techniques, such as DejaVu and MoEfication, have demonstrated their potential to significantly enhance the inference efficiency of large language models (LLMs). However, these techniques often rely on ReLU activation functions or require additional parameters and training to maintain performance. This paper introduces a training-free Threshold-based Dynamic Activation(TDA) method that leverage sequence information to exploit the inherent sparsity of models across various architectures. This method is designed to accelerate generation speed by 18-25\% without significantly compromising task performance, thereby addressing the limitations of existing DA techniques. Moreover, we delve into the root causes of LLM sparsity and theoretically analyze two of its critical features: history-related activation uncertainty and semantic-irrelevant activation inertia. Our comprehensive analyses not only provide a robust theoretical foundation for DA methods but also offer valuable insights to guide future research in optimizing LLMs for greater efficiency and effectiveness.

First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models

TL;DR

Abstract

Paper Structure (25 sections, 20 equations, 11 figures, 5 tables, 1 algorithm)

This paper contains 25 sections, 20 equations, 11 figures, 5 tables, 1 algorithm.

Introduction
Related Works
Inherent Sparsity in LLMs
Dynamic Activation
Training-Dependent DA with ReLU
Training-free DA
Preliminaries
Inherent Sparsity of LLMs
History-related Activation Uncertainty
Semantic-irrelevant Activation Inertia
A Closer Look at Activation Inertia
Methodology
Experiments
Setups
Models, Datasets.
...and 10 more sections

Figures (11)

Figure 1: Training-Dependent DA
Figure 2: Training-Free TDA
Figure 3: Active pattern of 16 tokens separately
Figure 4: Active pattern of these 16 tokens as a sentence
Figure 5: Active pattern of 4 random tokens separately
...and 6 more figures

Theorems & Definitions (11)

Claim 1
Definition 1
Definition 2
Proof 1
Claim 2
Definition 3
Proof 2
Claim 3
Definition 4
Proof 3
...and 1 more

First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models

TL;DR

Abstract

First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (11)

Theorems & Definitions (11)