Table of Contents
Fetching ...

Understanding the Limits of Deep Tabular Methods with Temporal Shift

Hao-Run Cai, Han-Jia Ye

TL;DR

The paper investigates why deep tabular methods falter under temporal distribution shifts and identifies training lag and validation bias in temporal splits as key culprits. It analyzes how temporal patterns are lost in deep representations and proposes a plug-and-play Fourier-series-based temporal embedding to recover periodic and trend information. A refined temporal splitting protocol is introduced to minimize lag and bias, yielding performance on par with random splits but with far greater stability. Together, these contributions offer a practical framework to enhance temporal generalization in deep tabular learning, demonstrated on the TabReD benchmark across diverse methods, including retrieval-based ones. The approach emphasizes explicit temporal information incorporation as essential for robust deployment in temporally evolving environments.

Abstract

Deep tabular models have demonstrated remarkable success on i.i.d. data, excelling in a variety of structured data tasks. However, their performance often deteriorates under temporal distribution shifts, where trends and periodic patterns are present in the evolving data distribution over time. In this paper, we explore the underlying reasons for this failure in capturing temporal dependencies. We begin by investigating the training protocol, revealing a key issue in how model selection performs. While existing approaches use temporal ordering for splitting validation set, we show that even a random split can significantly improve model performance. By minimizing the time lag between training data and test time, while reducing the bias in validation, our proposed training protocol significantly improves generalization across various methods. Furthermore, we analyze how temporal data affects deep tabular representations, uncovering that these models often fail to capture crucial periodic and trend information. To address this gap, we introduce a plug-and-play temporal embedding method based on Fourier series expansion to learn and incorporate temporal patterns, offering an adaptive approach to handle temporal shifts. Our experiments demonstrate that this temporal embedding, combined with the improved training protocol, provides a more effective and robust framework for learning from temporal tabular data.

Understanding the Limits of Deep Tabular Methods with Temporal Shift

TL;DR

The paper investigates why deep tabular methods falter under temporal distribution shifts and identifies training lag and validation bias in temporal splits as key culprits. It analyzes how temporal patterns are lost in deep representations and proposes a plug-and-play Fourier-series-based temporal embedding to recover periodic and trend information. A refined temporal splitting protocol is introduced to minimize lag and bias, yielding performance on par with random splits but with far greater stability. Together, these contributions offer a practical framework to enhance temporal generalization in deep tabular learning, demonstrated on the TabReD benchmark across diverse methods, including retrieval-based ones. The approach emphasizes explicit temporal information incorporation as essential for robust deployment in temporally evolving environments.

Abstract

Deep tabular models have demonstrated remarkable success on i.i.d. data, excelling in a variety of structured data tasks. However, their performance often deteriorates under temporal distribution shifts, where trends and periodic patterns are present in the evolving data distribution over time. In this paper, we explore the underlying reasons for this failure in capturing temporal dependencies. We begin by investigating the training protocol, revealing a key issue in how model selection performs. While existing approaches use temporal ordering for splitting validation set, we show that even a random split can significantly improve model performance. By minimizing the time lag between training data and test time, while reducing the bias in validation, our proposed training protocol significantly improves generalization across various methods. Furthermore, we analyze how temporal data affects deep tabular representations, uncovering that these models often fail to capture crucial periodic and trend information. To address this gap, we introduce a plug-and-play temporal embedding method based on Fourier series expansion to learn and incorporate temporal patterns, offering an adaptive approach to handle temporal shifts. Our experiments demonstrate that this temporal embedding, combined with the improved training protocol, provides a more effective and robust framework for learning from temporal tabular data.

Paper Structure

This paper contains 21 sections, 6 equations, 10 figures, 12 tables.

Figures (10)

  • Figure 1: Illustration of the challenges posed by temporal shifts. The change in data distribution over time is represented by dots on a line, with the dashed line depicting the underlying data distribution at different time slices. The shaded box indicates the mapping $f$ learned from the training data at $T_{\rm{train}}$, while the training data is typically treated as i.i.d. on $\mathcal{X}_{\rm{train}}$ and $\mathcal{Y}_{\rm{train}}$ in classical training processes. On i.i.d. data, the model can directly apply the learned mapping $f$ to make accurate predictions on test data, but it fails to generalize effectively when temporal shifts occur.
  • Figure 2: Performance comparison between temporal split in rubachev2024tabred and random split on TabReD benchmark, where only the data splitting strategy before $T_{\rm{train}}$ is changed. The percentage change represents the robust average of performance difference compared to the MLP with original temporal split. A positive percentage change indicates that the method outperforms the MLP with temporal split. Left: We reproduce the experiment from rubachev2024tabred and ensure a fair comparison by removing additional numerical feature encodings, as explained in \ref{['sec:appendix_dataset']}. In this setting, the performance of retrieval-based methods significantly declines, falling behind tree-based methods and MLP-PLR, while TabM achieves the best performance. Right: Performance improvements observed under the random split. Retrieval-based methods show the most substantial gains, and model rankings align more closely with conventional expectations. Detailed results are provided in \ref{['subsec:appendix_result_split']}.
  • Figure 3: Left:Experimental design for temporal split strategies. The top panel shows the original baseline adopted by rubachev2024tabred. The middle panel (a)–(d) illustrates: (i) training lag (a vs. b; \ref{['subsec:lag']}), (ii) validation bias (a vs. c; \ref{['subsec:bias']}), and (iii) validation equivalence (b vs. d; \ref{['subsec:equivalence']}). The bottom panel presents our proposed strategy (\ref{['subsec:citerion']}). Right:Performance improvement of different splitting strategies relative to split (c) on the TabReD benchmark, demonstrating benefits in reducing training lag and validation bias. Notably, the performance degradation from (b) to (d) is much smaller than the improvement achieved by (b), suggesting that adopting the alternative splitting strategy to maximizing data utilization is preferable. Detailed results in \ref{['subsec:appendix_result_split']}.
  • Figure 4: Left: The loss distribution shows how the model's performance distributed across time slices under different validation splitting strategies. The vertical axis represents the loss, where lower is better. Non-lagged splits (b) and (d) achieve better performance around $T_{\rm{train}}$ compared to lagged splits (a) and (c), while the higher-biased split (c) performs better on training-available data but fails to generalize compared to the lower-biased split (a). The loss distribution is smoothed by a Gaussian filter for better visualization. Right: The MMD heatmap visualizing the distribution distance between different time slices using linear kernel. Time slices are divided by date.
  • Figure 5: Left: Detailed MMD heatmap of the HI dataset, illustrating both trend (lighter colors farther from the diagonal) and yearly/weekly periodicity (stripes at different scales) in the data. Middle and Right: MMD heatmaps of the representations learned by an MLP before and after applying our temporal embedding. Without temporal embedding, the model captures only coarse-grained patterns (e.g., weekday vs. weekend) with weak discrimination. After incorporating temporal embedding, the learned representations align with the data distribution, capturing phase-specific temporal feature (e.g., day of the week) and achieving clear distinction. See \ref{['subsec:appendix_result_representation']} for more results.
  • ...and 5 more figures