Table of Contents
Fetching ...

Mixture of Low Rank Adaptation with Partial Parameter Sharing for Time Series Forecasting

Licheng Pan, Zhichao Chen, Haoxuan Li, Guangyi Liu, Zhijian Xu, Zhaoran Liu, Hao Wang, Ying Wei

TL;DR

This work identifies an expressiveness bottleneck in multi-task time-series forecasting, where a shared representation limits step-specific predictions. It introduces a two-stage approach: pre-train a one-step foundation model and adapt to multiple horizons using step-specific LoRA modules, thereby mitigating the bottleneck. Building on this, MoLA employs segment-based adaptation and a mixture of LoRA experts to enable partial parameter sharing across horizons, improving both efficiency and accuracy. Extensive experiments across diverse datasets and backbones show MoLA consistently outperforms state-of-the-art TSF methods and robustly generalizes to other models and fine-tuning techniques, signaling strong practical impact for long-horizon forecasting.

Abstract

Multi-task forecasting has become the standard approach for time-series forecasting (TSF). However, we show that it suffers from an Expressiveness Bottleneck, where predictions at different time steps share the same representation, leading to unavoidable errors even with optimal representations. To address this issue, we propose a two-stage framework: first, pre-train a foundation model for one-step-ahead prediction; then, adapt it using step-specific LoRA modules.This design enables the foundation model to handle any number of forecast steps while avoiding the expressiveness bottleneck. We further introduce the Mixture-of-LoRA (MoLA) model, which employs adaptively weighted LoRA experts to achieve partial parameter sharing across steps. This approach enhances both efficiency and forecasting performance by exploiting interdependencies between forecast steps. Experiments show that MoLA significantly improves model expressiveness and outperforms state-of-the-art time-series forecasting methods. Code is available at https://anonymous.4open.science/r/MoLA-BC92.

Mixture of Low Rank Adaptation with Partial Parameter Sharing for Time Series Forecasting

TL;DR

This work identifies an expressiveness bottleneck in multi-task time-series forecasting, where a shared representation limits step-specific predictions. It introduces a two-stage approach: pre-train a one-step foundation model and adapt to multiple horizons using step-specific LoRA modules, thereby mitigating the bottleneck. Building on this, MoLA employs segment-based adaptation and a mixture of LoRA experts to enable partial parameter sharing across horizons, improving both efficiency and accuracy. Extensive experiments across diverse datasets and backbones show MoLA consistently outperforms state-of-the-art TSF methods and robustly generalizes to other models and fine-tuning techniques, signaling strong practical impact for long-horizon forecasting.

Abstract

Multi-task forecasting has become the standard approach for time-series forecasting (TSF). However, we show that it suffers from an Expressiveness Bottleneck, where predictions at different time steps share the same representation, leading to unavoidable errors even with optimal representations. To address this issue, we propose a two-stage framework: first, pre-train a foundation model for one-step-ahead prediction; then, adapt it using step-specific LoRA modules.This design enables the foundation model to handle any number of forecast steps while avoiding the expressiveness bottleneck. We further introduce the Mixture-of-LoRA (MoLA) model, which employs adaptively weighted LoRA experts to achieve partial parameter sharing across steps. This approach enhances both efficiency and forecasting performance by exploiting interdependencies between forecast steps. Experiments show that MoLA significantly improves model expressiveness and outperforms state-of-the-art time-series forecasting methods. Code is available at https://anonymous.4open.science/r/MoLA-BC92.

Paper Structure

This paper contains 42 sections, 3 theorems, 24 equations, 10 figures, 10 tables, 1 algorithm.

Key Result

Theorem 3.1

Let $\bar{W} = [W \ b] \in \mathbb{R}^{\mathrm{T} \times (\mathrm{L}+1)}$ be the parameters in the MT-F's linear decoder, $Y\in\mathbb{R}^{\mathrm{T}\times\mathrm{D}}$ be the label sequence; the minimum attainable estimation error is where $\bar{W} = U\Sigma V^\top$ is the singular value decomposition of $\bar{W}$, $\mathrm{rank}(\bar{W}) \leq \min\{\mathrm{T}, \mathrm{L}+1\}$, and $\{U_i\}_{i=\m

Figures (10)

  • Figure 1: Visualization of representations generated with different forecasting step.
  • Figure 2: Visualization of MT-F, LoRA and MoLA approaches to generate multi-step forecasts. Gray blocks denote identical encoder components. Purple blocks represent decoding strategies. Rectangles with varying transparencies indicate different expert matrices in MoLA.
  • Figure 3: Visualization of forecast sequence generated with and without MoLA under two snapshots.
  • Figure 4: Benefit of incorporating MoLA in varying models, shown with colored bars for means over forecasting lengths (96, 192, 336, 720) and error bars for 95% confidence intervals.
  • Figure 5: Performance given varying rank $r$, learning rate $\eta$ and the number of experts $\mathrm{P}$.
  • ...and 5 more figures

Theorems & Definitions (5)

  • Theorem 3.1: Expressiveness Bottleneck
  • Theorem C.1: Expressiveness Bottleneck
  • proof
  • Theorem C.2: Variance Reduction of MoLA
  • proof