Table of Contents
Fetching ...

Comparing Prior and Learned Time Representations in Transformer Models of Timeseries

Natalia Koliou, Tatiana Boura, Stasinos Konstantopoulos, George Meramveliotakis, George Kosmadakis

TL;DR

It is concluded that research work is needed to work the human into the learning loop in ways that improve the robustness and trust-worthiness of the network.

Abstract

What sets timeseries analysis apart from other machine learning exercises is that time representation becomes a primary aspect of the experiment setup, as it must adequately represent the temporal relations that are relevant for the application at hand. In the work described here we study wo different variations of the Transformer architecture: one where we use the fixed time representation proposed in the literature and one where the time representation is learned from the data. Our experiments use data from predicting the energy output of solar panels, a task that exhibits known periodicities (daily and seasonal) that is straight-forward to encode in the fixed time representation. Our results indicate that even in an experiment where the phenomenon is well-understood, it is difficult to encode prior knowledge due to side-effects that are difficult to mitigate. We conclude that research work is needed to work the human into the learning loop in ways that improve the robustness and trust-worthiness of the network.

Comparing Prior and Learned Time Representations in Transformer Models of Timeseries

TL;DR

It is concluded that research work is needed to work the human into the learning loop in ways that improve the robustness and trust-worthiness of the network.

Abstract

What sets timeseries analysis apart from other machine learning exercises is that time representation becomes a primary aspect of the experiment setup, as it must adequately represent the temporal relations that are relevant for the application at hand. In the work described here we study wo different variations of the Transformer architecture: one where we use the fixed time representation proposed in the literature and one where the time representation is learned from the data. Our experiments use data from predicting the energy output of solar panels, a task that exhibits known periodicities (daily and seasonal) that is straight-forward to encode in the fixed time representation. Our results indicate that even in an experiment where the phenomenon is well-understood, it is difficult to encode prior knowledge due to side-effects that are difficult to mitigate. We conclude that research work is needed to work the human into the learning loop in ways that improve the robustness and trust-worthiness of the network.

Paper Structure

This paper contains 12 sections, 1 equation, 9 figures, 1 table.

Figures (9)

  • Figure 1: Hourly mean pyranometer values (red) and their function approximations, triangular pulse (blue) and sinusoidal function (green).
  • Figure 2: Examples of learned time representation features with a learnable triangular pulse. The five illustrated time features come from the same testing sample and highlight the capability of the model to learn (non-)isosceles pulses with different bases.
  • Figure 3: Normalized training loss progression for each model. The models presented are the best-performing models across different time representations (prior and learned).
  • Figure 4: Example of the learning progress of a sine time feature. Both figures illustrate the time representation of the test examples by aggregating them per time-feature using the mean and standard deviation. The left figure shows the feature before the learning process starts, while the right one shows the final learned feature. The learned time representation is meaningful since it distinguishes midday and has similar values for hours with similar accumulated daylight.
  • Figure 5: Learned sine time representations grouped by their similarity. Each figure is comprised of the aggregated representations computed on the testing data (mean and standard deviation of each time-feature). The top three figures gather the majority of the features, whereas the bottom three include only one feature each.
  • ...and 4 more figures