Manifold-Aware Temporal Domain Generalization for Large Language Models

Yiheng Yao; Zekun Cai; Xinyuan Song; Hiroki Hill Kobayashi; Xuan Song; Ryosuke Shibasaki; Liang Zhao

Manifold-Aware Temporal Domain Generalization for Large Language Models

Yiheng Yao, Zekun Cai, Xinyuan Song, Hiroki Hill Kobayashi, Xuan Song, Ryosuke Shibasaki, Liang Zhao

TL;DR

This paper tackles temporal distribution shifts in large language models by reframing temporal adaptation within a parameter-efficient, manifold-aware setting. It introduces MaT-LoRA, which constrains temporal updates to a shared low-dimensional subspace and represents per-domain changes through a time-varying core \\F_t\, in the factorization \\Delta W_t = B F_t A\\; this yields substantial parameter efficiency (independent of the number of time domains) while preserving expressive power. The authors provide theoretical justification for the shared-basis representation, showing subspace stability under gradient updates, and they instantiate the temporal core with continuous linear dynamics, sequential Markovian evolution, and non-linear time mappings. Empirical results on synthetic and real-world TDG benchmarks demonstrate superior temporal generalization and practical scalability across multiple LLM backbones, with only modest training-time overhead and comparable inference latency. Overall, MaT-LoRA offers a principled, scalable approach to TDG in LLMs, enabling durable performance under continuous distribution shifts in real deployments.

Abstract

Temporal distribution shifts are pervasive in real-world deployments of Large Language Models (LLMs), where data evolves continuously over time. While Temporal Domain Generalization (TDG) seeks to model such structured evolution, existing approaches characterize model adaptation in the full parameter space. This formulation becomes computationally infeasible for modern LLMs. This paper introduces a geometric reformulation of TDG under parameter-efficient fine-tuning. We establish that the low-dimensional temporal structure underlying model evolution can be preserved under parameter-efficient reparameterization, enabling temporal modeling without operating in the ambient parameter space. Building on this principle, we propose Manifold-aware Temporal LoRA (MaT-LoRA), which constrains temporal updates to a shared low-dimensional manifold within a low-rank adaptation subspace, and models its evolution through a structured temporal core. This reparameterization dramatically reduces temporal modeling complexity while retaining expressive power. Extensive experiments on synthetic and real-world datasets, including scientific documents, news publishers, and review ratings, demonstrate that MaT-LoRA achieves superior temporal generalization performance with practical scalability for LLMs.

Manifold-Aware Temporal Domain Generalization for Large Language Models

TL;DR

Abstract

Paper Structure (20 sections, 3 theorems, 34 equations, 2 figures, 2 tables)

This paper contains 20 sections, 3 theorems, 34 equations, 2 figures, 2 tables.

Introduction
Related Work
Problem definition
Methodology
The Geometry of Parameter Increments for Temporal Domains
Manifold-Constrained Low-Rank Factorization
Geometric Incoherence in Discrete Generalization
Subspace-Shared Time-Varying Parameterization
Instantiations of the Temporal Core
Theoretical Analysis
Parameter Efficiency Analysis
Stability and Justification of the Shared-Basis Form
Experiments
Synthetic Dataset
Real-word Dataset
...and 5 more sections

Key Result

Lemma 1

Let $\mathcal{M}\subset \mathbb{R}^p$ be an embedded $m$-dimensional submanifold. Then $\mathcal{M}' := \phi(\mathcal{M}) = \mathcal{M}-W_{\mathrm{pre}}$ is also an embedded $m$-dimensional submanifold of $\mathbb{R}^p$.

Figures (2)

Figure 1: Scalability bottleneck of TDG. Full-parameter TDG scales prohibitively with model size, whereas MaT-LoRA maintains overhead nearly constant and yields over $10^{10}\!\times$ reduction at the 1B pre-trained model scale.
Figure 2: Visualization of Extrapolated Decision Boundaries on the Rotating 2-Moons Dataset across Four LLMs Backbones.

Theorems & Definitions (7)

Definition 1: Parameter increments
Lemma 1: Embedded manifolds
proof
Lemma 2: Parameter-increment manifold
proof
Theorem 2: Stability of $(B_t,A_t)$ subspaces under GD
proof

Manifold-Aware Temporal Domain Generalization for Large Language Models

TL;DR

Abstract

Manifold-Aware Temporal Domain Generalization for Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (7)