Table of Contents
Fetching ...

Manifold-Aware Temporal Domain Generalization for Large Language Models

Yiheng Yao, Zekun Cai, Xinyuan Song, Hiroki Hill Kobayashi, Xuan Song, Ryosuke Shibasaki, Liang Zhao

TL;DR

This paper tackles temporal distribution shifts in large language models by reframing temporal adaptation within a parameter-efficient, manifold-aware setting. It introduces MaT-LoRA, which constrains temporal updates to a shared low-dimensional subspace and represents per-domain changes through a time-varying core \\F_t\, in the factorization \\Delta W_t = B F_t A\\; this yields substantial parameter efficiency (independent of the number of time domains) while preserving expressive power. The authors provide theoretical justification for the shared-basis representation, showing subspace stability under gradient updates, and they instantiate the temporal core with continuous linear dynamics, sequential Markovian evolution, and non-linear time mappings. Empirical results on synthetic and real-world TDG benchmarks demonstrate superior temporal generalization and practical scalability across multiple LLM backbones, with only modest training-time overhead and comparable inference latency. Overall, MaT-LoRA offers a principled, scalable approach to TDG in LLMs, enabling durable performance under continuous distribution shifts in real deployments.

Abstract

Temporal distribution shifts are pervasive in real-world deployments of Large Language Models (LLMs), where data evolves continuously over time. While Temporal Domain Generalization (TDG) seeks to model such structured evolution, existing approaches characterize model adaptation in the full parameter space. This formulation becomes computationally infeasible for modern LLMs. This paper introduces a geometric reformulation of TDG under parameter-efficient fine-tuning. We establish that the low-dimensional temporal structure underlying model evolution can be preserved under parameter-efficient reparameterization, enabling temporal modeling without operating in the ambient parameter space. Building on this principle, we propose Manifold-aware Temporal LoRA (MaT-LoRA), which constrains temporal updates to a shared low-dimensional manifold within a low-rank adaptation subspace, and models its evolution through a structured temporal core. This reparameterization dramatically reduces temporal modeling complexity while retaining expressive power. Extensive experiments on synthetic and real-world datasets, including scientific documents, news publishers, and review ratings, demonstrate that MaT-LoRA achieves superior temporal generalization performance with practical scalability for LLMs.

Manifold-Aware Temporal Domain Generalization for Large Language Models

TL;DR

This paper tackles temporal distribution shifts in large language models by reframing temporal adaptation within a parameter-efficient, manifold-aware setting. It introduces MaT-LoRA, which constrains temporal updates to a shared low-dimensional subspace and represents per-domain changes through a time-varying core \\F_t\, in the factorization \\Delta W_t = B F_t A\\; this yields substantial parameter efficiency (independent of the number of time domains) while preserving expressive power. The authors provide theoretical justification for the shared-basis representation, showing subspace stability under gradient updates, and they instantiate the temporal core with continuous linear dynamics, sequential Markovian evolution, and non-linear time mappings. Empirical results on synthetic and real-world TDG benchmarks demonstrate superior temporal generalization and practical scalability across multiple LLM backbones, with only modest training-time overhead and comparable inference latency. Overall, MaT-LoRA offers a principled, scalable approach to TDG in LLMs, enabling durable performance under continuous distribution shifts in real deployments.

Abstract

Temporal distribution shifts are pervasive in real-world deployments of Large Language Models (LLMs), where data evolves continuously over time. While Temporal Domain Generalization (TDG) seeks to model such structured evolution, existing approaches characterize model adaptation in the full parameter space. This formulation becomes computationally infeasible for modern LLMs. This paper introduces a geometric reformulation of TDG under parameter-efficient fine-tuning. We establish that the low-dimensional temporal structure underlying model evolution can be preserved under parameter-efficient reparameterization, enabling temporal modeling without operating in the ambient parameter space. Building on this principle, we propose Manifold-aware Temporal LoRA (MaT-LoRA), which constrains temporal updates to a shared low-dimensional manifold within a low-rank adaptation subspace, and models its evolution through a structured temporal core. This reparameterization dramatically reduces temporal modeling complexity while retaining expressive power. Extensive experiments on synthetic and real-world datasets, including scientific documents, news publishers, and review ratings, demonstrate that MaT-LoRA achieves superior temporal generalization performance with practical scalability for LLMs.
Paper Structure (20 sections, 3 theorems, 34 equations, 2 figures, 2 tables)

This paper contains 20 sections, 3 theorems, 34 equations, 2 figures, 2 tables.

Key Result

Lemma 1

Let $\mathcal{M}\subset \mathbb{R}^p$ be an embedded $m$-dimensional submanifold. Then $\mathcal{M}' := \phi(\mathcal{M}) = \mathcal{M}-W_{\mathrm{pre}}$ is also an embedded $m$-dimensional submanifold of $\mathbb{R}^p$.

Figures (2)

  • Figure 1: Scalability bottleneck of TDG. Full-parameter TDG scales prohibitively with model size, whereas MaT-LoRA maintains overhead nearly constant and yields over $10^{10}\!\times$ reduction at the 1B pre-trained model scale.
  • Figure 2: Visualization of Extrapolated Decision Boundaries on the Rotating 2-Moons Dataset across Four LLMs Backbones.

Theorems & Definitions (7)

  • Definition 1: Parameter increments
  • Lemma 1: Embedded manifolds
  • proof
  • Lemma 2: Parameter-increment manifold
  • proof
  • Theorem 2: Stability of $(B_t,A_t)$ subspaces under GD
  • proof