MOYU: A Theoretical Study on Massive Over-activation Yielded Uplifts in LLMs

Chi Ma; Mincong Huang; Chao Wang; Yujie Wang; Lei Yu

MOYU: A Theoretical Study on Massive Over-activation Yielded Uplifts in LLMs

Chi Ma, Mincong Huang, Chao Wang, Yujie Wang, Lei Yu

TL;DR

This paper addresses inference latency in large language models by formalizing MOYU (Massive Over-activation Yielded Uplifts) as a dynamic activation phenomenon. It develops a universal theoretical framework to explain MOYU’s origins and analyzes two core limitations of existing DA methods: dependence on ReLU activations and the inability to identify semantically meaningful active neurons. Through mathematical arguments, including $E[\partial \ell_{CE}/\partial p_{i^*}] > 0$ for positive activations and a weight-importance update $\Theta_i = |V| \cdot \nabla_{d\theta_i}L_i + \Theta_{i-1}$, the work explains why current DA strategies struggle to generalize across architectures and activation functions. The authors argue that activation history and inertia shape activation patterns more than semantic content, guiding future design of sparsity-based speedups and dynamic routing that are robust across models and tasks. Overall, the paper provides a theoretical basis for refining MOYU-inspired sparsity schemes and informs practical directions for cross-architecture acceleration in LLMs.

Abstract

Massive Over-activation Yielded Uplifts(MOYU) is an inherent property of large language models, and dynamic activation(DA) based on the MOYU property is a clever yet under-explored strategy designed to accelerate inference in these models. Existing methods that utilize MOYU often face a significant 'Impossible Trinity': struggling to simultaneously maintain model performance, enhance inference speed, and extend applicability across various architectures. Due to the theoretical ambiguities surrounding MOYU, this paper elucidates the root cause of the MOYU property and outlines the mechanisms behind two primary limitations encountered by current DA methods: 1) history-related activation uncertainty, and 2) semantic-irrelevant activation inertia. Our analysis not only underscores the limitations of current dynamic activation strategies within large-scale LLaMA models but also proposes opportunities for refining the design of future sparsity schemes.

MOYU: A Theoretical Study on Massive Over-activation Yielded Uplifts in LLMs

TL;DR

for positive activations and a weight-importance update

, the work explains why current DA strategies struggle to generalize across architectures and activation functions. The authors argue that activation history and inertia shape activation patterns more than semantic content, guiding future design of sparsity-based speedups and dynamic routing that are robust across models and tasks. Overall, the paper provides a theoretical basis for refining MOYU-inspired sparsity schemes and informs practical directions for cross-architecture acceleration in LLMs.

Abstract

Paper Structure (10 sections, 10 equations, 2 figures)

This paper contains 10 sections, 10 equations, 2 figures.

Introduction
Related Works
Massive Over-activation
TDA and RODA
RIDA
Unveiling MOYU
Sequencing MOYU
History-related Activation Uncertainty
Semantic-irrelevant Activation Inertia
Conclusion and Limitations

Figures (2)

Figure 1: Four kinds of DA methods
Figure 2: Neuron Activation Pattern Comparisons Across Different Sampling and Input Manners

MOYU: A Theoretical Study on Massive Over-activation Yielded Uplifts in LLMs

TL;DR

Abstract

MOYU: A Theoretical Study on Massive Over-activation Yielded Uplifts in LLMs

Authors

TL;DR

Abstract

Table of Contents

Figures (2)