PaceLLM: Brain-Inspired Large Language Models for Long-Context Understanding
Kangcong Li, Peng Ye, Chongjun Tu, Lin Zhang, Chunfeng Song, Jiamin Wu, Tao Yang, Qihao Zheng, Tao Chen
TL;DR
PaceLLM introduces brain-inspired mechanisms to address long-context challenges in LLMs by adding an Activation Memory Bank that mimics persistent working-memory activity and a Cortical Expert clustering scheme that reorganizes FFN weights into semantically coherent modules. The method operates training-free and is compatible with existing architectures, achieving notable gains on long-context benchmarks and extending usable context to 200K tokens in NIAH. Key contributions include a detailed AMB retrieval/update strategy with cosine similarity memory lookups and a constrained KMeans-based FFN reorganization that preserves inference compatibility. The resulting approach improves coherence and cross-token dependencies without extensive retraining, offering a practical, generalizable path to stronger long-context understanding with interpretable internal structure.
Abstract
While Large Language Models (LLMs) demonstrate strong performance across domains, their long-context capabilities are limited by transient neural activations causing information decay and unstructured feed-forward network (FFN) weights leading to semantic fragmentation. Inspired by the brain's working memory and cortical modularity, we propose PaceLLM, featuring two innovations: (1) a Persistent Activity (PA) Mechanism that mimics prefrontal cortex (PFC) neurons' persistent firing by introducing an activation-level memory bank to dynamically retrieve, reuse, and update critical FFN states, addressing contextual decay; and (2) Cortical Expert (CE) Clustering that emulates task-adaptive neural specialization to reorganize FFN weights into semantic modules, establishing cross-token dependencies and mitigating fragmentation. Extensive evaluations show that PaceLLM achieves 6% improvement on LongBench's Multi-document QA and 12.5-17.5% performance gains on Infinite-Bench tasks, while extending measurable context length to 200K tokens in Needle-In-A-Haystack (NIAH) tests. This work pioneers brain-inspired LLM optimization and is complementary to other works. Besides, it can be generalized to any model and enhance their long-context performance and interpretability without structural overhauls.
