Table of Contents
Fetching ...

Multivariate Data Augmentation for Predictive Maintenance using Diffusion

Andrew Thompson, Alexander Sommers, Alicia Russell-Gilbert, Logan Cummins, Sudip Mittal, Shahram Rahimi, Maria Seale, Joseph Jaboure, Thomas Arnold, Joshua Church

TL;DR

The following paper demonstrates a system for generating useful, multivariate synthetic data for predictive maintenance, and how it can be applied to systems that have yet to fail.

Abstract

Predictive maintenance has been used to optimize system repairs in the industrial, medical, and financial domains. This technique relies on the consistent ability to detect and predict anomalies in critical systems. AI models have been trained to detect system faults, improving predictive maintenance efficiency. Typically there is a lack of fault data to train these models, due to organizations working to keep fault occurrences and down time to a minimum. For newly installed systems, no fault data exists since they have yet to fail. By using diffusion models for synthetic data generation, the complex training datasets for these predictive models can be supplemented with high level synthetic fault data to improve their performance in anomaly detection. By learning the relationship between healthy and faulty data in similar systems, a diffusion model can attempt to apply that relationship to healthy data of a newly installed system that has no fault data. The diffusion model would then be able to generate useful fault data for the new system, and enable predictive models to be trained for predictive maintenance. The following paper demonstrates a system for generating useful, multivariate synthetic data for predictive maintenance, and how it can be applied to systems that have yet to fail.

Multivariate Data Augmentation for Predictive Maintenance using Diffusion

TL;DR

The following paper demonstrates a system for generating useful, multivariate synthetic data for predictive maintenance, and how it can be applied to systems that have yet to fail.

Abstract

Predictive maintenance has been used to optimize system repairs in the industrial, medical, and financial domains. This technique relies on the consistent ability to detect and predict anomalies in critical systems. AI models have been trained to detect system faults, improving predictive maintenance efficiency. Typically there is a lack of fault data to train these models, due to organizations working to keep fault occurrences and down time to a minimum. For newly installed systems, no fault data exists since they have yet to fail. By using diffusion models for synthetic data generation, the complex training datasets for these predictive models can be supplemented with high level synthetic fault data to improve their performance in anomaly detection. By learning the relationship between healthy and faulty data in similar systems, a diffusion model can attempt to apply that relationship to healthy data of a newly installed system that has no fault data. The diffusion model would then be able to generate useful fault data for the new system, and enable predictive models to be trained for predictive maintenance. The following paper demonstrates a system for generating useful, multivariate synthetic data for predictive maintenance, and how it can be applied to systems that have yet to fail.

Paper Structure

This paper contains 12 sections, 10 figures.

Figures (10)

  • Figure 1: DSAT-ECG model architecture. Figure inspired by DSAT-ECG. FFN stands for Feed-Forward Network, and T-Emb stands for Time Embedding. DSAT-ECG is able to apply conditional information to guide the denoising process in generating certain types of data samples. The SPADE blocks are able to help capture both local and long term dependencies.
  • Figure 2: The dataset runs-to-failure (RtF's) are divided into two disjoint groups. Dataset A comprises complete RtF data of the Group 1 bearings. Additionally, it includes the initial operational data for the Group 2 bearings prior to any faults. Dataset U encompasses the remaining operational data from each RtF for the Group 2 bearings.
  • Figure 3: To test the quality of synthetic data, 2 instances of our predictive model ($M_{p}$) were trained. One model ($M_{p}'$) is trained on just the original dataset A. The other model ($M_{p}"$) is trained using both the original dataset A and the synthetic dataset S. If ($M_{p}"$) performs better, the synthetic data can be considered useful for PdM.
  • Figure 4: Generation scheme used for DSAT-ECG generation of unavailable data. Details for the partitioning scheme can be found in Figure \ref{['fig:partScheme']}. Conditional labels are taken from U for generation, as TSGBench works best when the real and synthetic data samples are equal in number and type.
  • Figure 5: Samples of horizontal (orange) and vertical (blue) bearing data. From top to bottom: real healthy sample, real faulty sample, synthetic healthy sample, synthetic faulty sample. Values range from (-1.5, 1.5), (-6, 6), (-1, 1.5), and (-4, 4) respectively.
  • ...and 5 more figures