Table of Contents
Fetching ...

Recurrent Auto-Encoder Model for Large-Scale Industrial Sensor Signal Analysis

Timothy Wong, Zhiyuan Luo

TL;DR

The paper addresses the challenge of analysing unbounded, high-dimensional industrial sensor streams by learning compact representations with a recurrent auto-encoder. It introduces a partial reconstruction scheme and rolling-window sampling, producing fixed-length context vectors that summarize full-system dynamics while only reconstructing a subset of sensors. Empirical results show that partial reconstruction outperforms full reconstruction in training and validation errors, and context vectors can be visualized and clustered to reflect operating states, enabling online diagnostics and maintenance planning. This approach scales to very high dimensional data and provides a practical framework for identifying process states and anomalies from unlabelled streams in large-scale industrial environments.

Abstract

Recurrent auto-encoder model summarises sequential data through an encoder structure into a fixed-length vector and then reconstructs the original sequence through the decoder structure. The summarised vector can be used to represent time series features. In this paper, we propose relaxing the dimensionality of the decoder output so that it performs partial reconstruction. The fixed-length vector therefore represents features in the selected dimensions only. In addition, we propose using rolling fixed window approach to generate training samples from unbounded time series data. The change of time series features over time can be summarised as a smooth trajectory path. The fixed-length vectors are further analysed using additional visualisation and unsupervised clustering techniques. The proposed method can be applied in large-scale industrial processes for sensors signal analysis purpose, where clusters of the vector representations can reflect the operating states of the industrial system.

Recurrent Auto-Encoder Model for Large-Scale Industrial Sensor Signal Analysis

TL;DR

The paper addresses the challenge of analysing unbounded, high-dimensional industrial sensor streams by learning compact representations with a recurrent auto-encoder. It introduces a partial reconstruction scheme and rolling-window sampling, producing fixed-length context vectors that summarize full-system dynamics while only reconstructing a subset of sensors. Empirical results show that partial reconstruction outperforms full reconstruction in training and validation errors, and context vectors can be visualized and clustered to reflect operating states, enabling online diagnostics and maintenance planning. This approach scales to very high dimensional data and provides a practical framework for identifying process states and anomalies from unlabelled streams in large-scale industrial environments.

Abstract

Recurrent auto-encoder model summarises sequential data through an encoder structure into a fixed-length vector and then reconstructs the original sequence through the decoder structure. The summarised vector can be used to represent time series features. In this paper, we propose relaxing the dimensionality of the decoder output so that it performs partial reconstruction. The fixed-length vector therefore represents features in the selected dimensions only. In addition, we propose using rolling fixed window approach to generate training samples from unbounded time series data. The change of time series features over time can be summarised as a smooth trajectory path. The fixed-length vectors are further analysed using additional visualisation and unsupervised clustering techniques. The proposed method can be applied in large-scale industrial processes for sensors signal analysis purpose, where clusters of the vector representations can reflect the operating states of the industrial system.

Paper Structure

This paper contains 11 sections, 1 equation, 7 figures, 1 algorithm.

Figures (7)

  • Figure 1: Recurrent auto-encoder model. Both the encoder and decoder are made up of multilayered RNN. Arrows indicate the direction of information flow.
  • Figure 2: Effects of relaxing dimensionality of the output sequence on the training and validation MSE losses. They contain same number of layers in the RNN encoder and decoder respectively. All hidden layers contain same number of LSTM neurons with hyperbolic tangent activation.
  • Figure 3: A heatmap showing eight randomly selected output sequences in the held-out validation set. Colour represents magnitude of sensor measurements in normalised scale.
  • Figure 4: The first example. On the left, the context vectors were projected into two-dimensional space using PCA. The black solid line on the left joins all consecutive context vectors together as a trajectory. Different number of clusters were identified using simple $K$-means algorithm. Cluster assignment and the SVM decision boundaries are coloured in the charts. On the right, output dimensions are visualised on a shared time axis. The black solid line demarcates the training set ($70\%$) and validation sets ($30\%$). The line segments are colour-coded to match the corresponding clusters.
  • Figure 5: The second example. The sensor data is drawn from the same time period as the previous example, only the output dimension has been changed to $K=2$ where another set of gas pressure sensors were selected.
  • ...and 2 more figures