Manifold-Aware Temporal Domain Generalization for Large Language Models
Yiheng Yao, Zekun Cai, Xinyuan Song, Hiroki Hill Kobayashi, Xuan Song, Ryosuke Shibasaki, Liang Zhao
TL;DR
This paper tackles temporal distribution shifts in large language models by reframing temporal adaptation within a parameter-efficient, manifold-aware setting. It introduces MaT-LoRA, which constrains temporal updates to a shared low-dimensional subspace and represents per-domain changes through a time-varying core \\F_t\, in the factorization \\Delta W_t = B F_t A\\; this yields substantial parameter efficiency (independent of the number of time domains) while preserving expressive power. The authors provide theoretical justification for the shared-basis representation, showing subspace stability under gradient updates, and they instantiate the temporal core with continuous linear dynamics, sequential Markovian evolution, and non-linear time mappings. Empirical results on synthetic and real-world TDG benchmarks demonstrate superior temporal generalization and practical scalability across multiple LLM backbones, with only modest training-time overhead and comparable inference latency. Overall, MaT-LoRA offers a principled, scalable approach to TDG in LLMs, enabling durable performance under continuous distribution shifts in real deployments.
Abstract
Temporal distribution shifts are pervasive in real-world deployments of Large Language Models (LLMs), where data evolves continuously over time. While Temporal Domain Generalization (TDG) seeks to model such structured evolution, existing approaches characterize model adaptation in the full parameter space. This formulation becomes computationally infeasible for modern LLMs. This paper introduces a geometric reformulation of TDG under parameter-efficient fine-tuning. We establish that the low-dimensional temporal structure underlying model evolution can be preserved under parameter-efficient reparameterization, enabling temporal modeling without operating in the ambient parameter space. Building on this principle, we propose Manifold-aware Temporal LoRA (MaT-LoRA), which constrains temporal updates to a shared low-dimensional manifold within a low-rank adaptation subspace, and models its evolution through a structured temporal core. This reparameterization dramatically reduces temporal modeling complexity while retaining expressive power. Extensive experiments on synthetic and real-world datasets, including scientific documents, news publishers, and review ratings, demonstrate that MaT-LoRA achieves superior temporal generalization performance with practical scalability for LLMs.
