Table of Contents
Fetching ...

TimeMosaic: Temporal Heterogeneity Guided Time Series Forecasting via Adaptive Granularity Patch and Segment-wise Decoding

Kuiye Ding, Fanda Fan, Chunyi Hou, Zheya Wang, Lei Wang, Zhengxin Yang, Jianfeng Zhan

TL;DR

TimeMosaic tackles two key forms of temporal heterogeneity in multivariate time series forecasting: encoding heterogeneity (local information density) and decoding heterogeneity (horizon-specific difficulty). It combines adaptive patch embedding, which selects region-specific patch sizes from a candidate set $\mathcal{F} = {f_1, ..., f_K}$, with segment-wise prompt tuning that assigns horizon-specific prompts to a shared encoder, enabling horizon-aware precision without reconfiguring the backbone. Empirical results across 17 real-world datasets show TimeMosaic achieving state-of-the-art or competitive performance for long-term forecasting, with zero-shot results on a large-scale pretraining corpus (BLAST) indicating foundation-model-like generalization and strong efficiency relative to TSFMs. The approach remains parameter-efficient by freezing the backbone while learning prompts and patch-selection, and shows robust performance under unified evaluation settings and fair hyperparameter controls. Overall, TimeMosaic provides a principled, scalable framework to exploit temporal heterogeneity in both input encoding and horizon-specific decoding, with potential for streaming scenarios and large-scale pretraining.

Abstract

Multivariate time series forecasting is essential in domains such as finance, transportation, climate, and energy. However, existing patch-based methods typically adopt fixed-length segmentation, overlooking the heterogeneity of local temporal dynamics and the decoding heterogeneity of forecasting. Such designs lose details in information-dense regions, introduce redundancy in stable segments, and fail to capture the distinct complexities of short-term and long-term horizons. We propose TimeMosaic, a forecasting framework that aims to address temporal heterogeneity. TimeMosaic employs adaptive patch embedding to dynamically adjust granularity according to local information density, balancing motif reuse with structural clarity while preserving temporal continuity. In addition, it introduces segment-wise decoding that treats each prediction horizon as a related subtask and adapts to horizon-specific difficulty and information requirements, rather than applying a single uniform decoder. Extensive evaluations on benchmark datasets demonstrate that TimeMosaic delivers consistent improvements over existing methods, and our model trained on the large-scale corpus with 321 billion observations achieves performance competitive with state-of-the-art TSFMs.

TimeMosaic: Temporal Heterogeneity Guided Time Series Forecasting via Adaptive Granularity Patch and Segment-wise Decoding

TL;DR

TimeMosaic tackles two key forms of temporal heterogeneity in multivariate time series forecasting: encoding heterogeneity (local information density) and decoding heterogeneity (horizon-specific difficulty). It combines adaptive patch embedding, which selects region-specific patch sizes from a candidate set , with segment-wise prompt tuning that assigns horizon-specific prompts to a shared encoder, enabling horizon-aware precision without reconfiguring the backbone. Empirical results across 17 real-world datasets show TimeMosaic achieving state-of-the-art or competitive performance for long-term forecasting, with zero-shot results on a large-scale pretraining corpus (BLAST) indicating foundation-model-like generalization and strong efficiency relative to TSFMs. The approach remains parameter-efficient by freezing the backbone while learning prompts and patch-selection, and shows robust performance under unified evaluation settings and fair hyperparameter controls. Overall, TimeMosaic provides a principled, scalable framework to exploit temporal heterogeneity in both input encoding and horizon-specific decoding, with potential for streaming scenarios and large-scale pretraining.

Abstract

Multivariate time series forecasting is essential in domains such as finance, transportation, climate, and energy. However, existing patch-based methods typically adopt fixed-length segmentation, overlooking the heterogeneity of local temporal dynamics and the decoding heterogeneity of forecasting. Such designs lose details in information-dense regions, introduce redundancy in stable segments, and fail to capture the distinct complexities of short-term and long-term horizons. We propose TimeMosaic, a forecasting framework that aims to address temporal heterogeneity. TimeMosaic employs adaptive patch embedding to dynamically adjust granularity according to local information density, balancing motif reuse with structural clarity while preserving temporal continuity. In addition, it introduces segment-wise decoding that treats each prediction horizon as a related subtask and adapts to horizon-specific difficulty and information requirements, rather than applying a single uniform decoder. Extensive evaluations on benchmark datasets demonstrate that TimeMosaic delivers consistent improvements over existing methods, and our model trained on the large-scale corpus with 321 billion observations achieves performance competitive with state-of-the-art TSFMs.

Paper Structure

This paper contains 73 sections, 13 equations, 17 figures, 25 tables.

Figures (17)

  • Figure 1: Comparison between existing fixed-patch models and our adaptive patching design, highlighting differences in input representation and prediction structure.
  • Figure 2: Zipf deviation and Silhouette score. These results are obtained by extracting patches of different lengths from a large-scale collection of time series forecasting datasets (see Appendix A), followed by K-Means clustering under various cluster settings.
  • Figure 3: Overall architecture of TimeMosaic, which shares the same modular arrangement with the encoder of Transformer. (a) The input multivariate time series is first processed by the Adaptive Patch Embedding module, which segments the sequence into patches of varying granularity based on learnable region-aware decisions. (b) A set of learnable prompt tokens are injected into the input sequence and interact with patch embeddings via multi-head attention to guide segment-wise forecasting. The attention maps in this panel visualize the interactions among adaptive patches as well as between patches and prompts. (c) Input sequence are normalized at the Channel level, same as iTransformer liu2023itransformer. (d) The model performs segment-wise forecasting, from the short-term to the long-term.
  • Figure 4: Segments of different sizes on five datasets with a prediction length of 192. Left: MSE. Right: MAE.
  • Figure 5: Attention Pattern Visualization Across Models.
  • ...and 12 more figures