Lab-scale Vibration Analysis Dataset and Baseline Methods for Machinery Fault Diagnosis with Machine Learning
Bagus Tris Atmaja, Haris Ihsannur, Suyanto, Dhany Arifianto
TL;DR
This work tackles the need for accessible, labeled vibration data for machinery fault diagnosis by introducing the lab-scale VBL-VA001 dataset, covering four machine conditions with 4000 samples in CSV form. It establishes a simple yet effective baseline by extracting nine frequency-domain features per axis (27 features total) from FFT-transformed signals and evaluating SVM, KNN, and GNB classifiers. The results show SVM with an RBF kernel achieves near-perfect performance, achieving 99.75% weighted accuracy in 5-fold cross-validation and a perfect 1-fold test, demonstrating the dataset's value for benchmarking and ML-based fault detection. The dataset is openly available, enabling reproducible research and future improvements in robustness and generalization beyond the lab setting.
Abstract
The monitoring of machine conditions in a plant is crucial for production in manufacturing. A sudden failure of a machine can stop production and cause a loss of revenue. The vibration signal of a machine is a good indicator of its condition. This paper presents a dataset of vibration signals from a lab-scale machine. The dataset contains four different types of machine conditions: normal, unbalance, misalignment, and bearing fault. Three machine learning methods (SVM, KNN, and GNB) evaluated the dataset, and a perfect result was obtained by one of the methods on a 1-fold test. The performance of the algorithms is evaluated using weighted accuracy (WA) since the data is balanced. The results show that the best-performing algorithm is the SVM with a WA of 99.75\% on the 5-fold cross-validations. The dataset is provided in the form of CSV files in an open and free repository at https://zenodo.org/record/7006575.
