Table of Contents
Fetching ...

Time Elastic Neural Networks

Pierre-François Marteau

TL;DR

The comparative study that is carried out shows that the teNN obtains results comparable to those of the state of the art, in particular similar to those of a network mixing LSTM and CNN architectures for example.

Abstract

We introduce and detail an atypical neural network architecture, called time elastic neural network (teNN), for multivariate time series classification. The novelty compared to classical neural network architecture is that it explicitly incorporates time warping ability, as well as a new way of considering attention. In addition, this architecture is capable of learning a dropout strategy, thus optimizing its own architecture.Behind the design of this architecture, our overall objective is threefold: firstly, we are aiming at improving the accuracy of instance based classification approaches that shows quite good performances as far as enough training data is available. Secondly we seek to reduce the computational complexity inherent to these methods to improve their scalability. Ideally, we seek to find an acceptable balance between these first two criteria. And finally, we seek to enhance the explainability of the decision provided by this kind of neural architecture.The experiment demonstrates that the stochastic gradient descent implemented to train a teNN is quite effective. To the extent that the selection of some critical meta-parameters is correct, convergence is generally smooth and fast.While maintaining good accuracy, we get a drastic gain in scalability by first reducing the required number of reference time series, i.e. the number of teNN cells required. Secondly, we demonstrate that, during the training process, the teNN succeeds in reducing the number of neurons required within each cell. Finally, we show that the analysis of the activation and attention matrices as well as the reference time series after training provides relevant information to interpret and explain the classification results.The comparative study that we have carried out and which concerns around thirty diverse and multivariate datasets shows that the teNN obtains results comparable to those of the state of the art, in particular similar to those of a network mixing LSTM and CNN architectures for example.

Time Elastic Neural Networks

TL;DR

The comparative study that is carried out shows that the teNN obtains results comparable to those of the state of the art, in particular similar to those of a network mixing LSTM and CNN architectures for example.

Abstract

We introduce and detail an atypical neural network architecture, called time elastic neural network (teNN), for multivariate time series classification. The novelty compared to classical neural network architecture is that it explicitly incorporates time warping ability, as well as a new way of considering attention. In addition, this architecture is capable of learning a dropout strategy, thus optimizing its own architecture.Behind the design of this architecture, our overall objective is threefold: firstly, we are aiming at improving the accuracy of instance based classification approaches that shows quite good performances as far as enough training data is available. Secondly we seek to reduce the computational complexity inherent to these methods to improve their scalability. Ideally, we seek to find an acceptable balance between these first two criteria. And finally, we seek to enhance the explainability of the decision provided by this kind of neural architecture.The experiment demonstrates that the stochastic gradient descent implemented to train a teNN is quite effective. To the extent that the selection of some critical meta-parameters is correct, convergence is generally smooth and fast.While maintaining good accuracy, we get a drastic gain in scalability by first reducing the required number of reference time series, i.e. the number of teNN cells required. Secondly, we demonstrate that, during the training process, the teNN succeeds in reducing the number of neurons required within each cell. Finally, we show that the analysis of the activation and attention matrices as well as the reference time series after training provides relevant information to interpret and explain the classification results.The comparative study that we have carried out and which concerns around thirty diverse and multivariate datasets shows that the teNN obtains results comparable to those of the state of the art, in particular similar to those of a network mixing LSTM and CNN architectures for example.
Paper Structure (33 sections, 3 theorems, 26 equations, 28 figures, 6 tables, 2 algorithms)

This paper contains 33 sections, 3 theorems, 26 equations, 28 figures, 6 tables, 2 algorithms.

Key Result

Proposition 4

If kernel $k(.,.)$ is positive definite on $\mathbb{R}^d \cup \{\Lambda\}$ then $\forall n \ge 1$ and $\forall \pi \in \Pi_{n,n}$, then is a p.d. kernel on $(\mathbb{U}_{2n})$

Figures (28)

  • Figure 1: A non exhaustive history of time elastic matching for time series comparison. The founding work is presented in orange, the work on elastic distances in blue and the work on elastic kernel in green.
  • Figure 2: Ranking of elastic measures according to Paparrizos et al. study (Figure from Paparrizos2020). Distance and kernel measures are evaluated on 128 datasets from the UCR archive.
  • Figure 3: Example of an alignment path corresponding to the alignment map $(0,0)(0,1)(1,2)(1,3)(2,4)(3,4)(4,5)$. The white squares correspond to substitution or match operations and black circles to either deletion or insertion operations.
  • Figure 4: Projections generated by the alignment path $\pi$. To each time series in $\mathbb{U}_n$ corresponds two series (embeddings) in the space $\mathbb{U}_{2n}$. The existence of a kernel in the embedding space allows for the construction of an elastic kernel back into the time series space $\mathbb{U}_n$.
  • Figure 5: Forward Backward matrix (logarithmic values) for the alignment of a positive halfwave with a sinus wave. The dark red color represents high probability cells, while dark blue color represents low probability cells.
  • ...and 23 more figures

Theorems & Definitions (6)

  • Definition 1
  • Definition 2
  • Definition 3
  • Proposition 4
  • Proposition 5
  • Proposition 6