Multi Class Parkinson Disease Detection Based on Finger Tapping Using Attention Enhanced CNN BiLSTM
Abu Saleh Musa Miah, Najmul Hassan, Md Maruf Al Hossain, Yuichi Okuyama, Jungpil Shin
TL;DR
This paper tackles the need for objective PD severity assessment by leveraging finger-tapping as a non-invasive biomarker. It introduces an attention-enhanced CNN–BiLSTM framework that integrates handcrafted temporal features extracted from finger-tapping videos with deep learning to classify five PD severity levels. The model combines Conv1D-based spatial feature extraction, BiLSTM-based temporal modeling, and an attention mechanism to focus on informative sequence parts, achieving 93% test accuracy and strong macro-averaged metrics on the ParkTest dataset. This work offers a promising, non-invasive tool for clinicians to monitor PD progression and could be extended to incorporate multi-modal data for even more robust assessments.
Abstract
Accurate evaluation of Parkinsons disease (PD) severity is essential for effective clinical management and intervention development. Despite the proposal of several gesture based PD recognition systems, including those using the finger tapping task to assess Parkinsonian symptoms, their performance remains unsatisfactory. In this study, we present a multi class PD detection system based on finger-tapping, using an attention-enhanced CNN BiLSTM framework combined with handcrafted feature extraction and deep learning techniques. In the procedure, we used an existing dataset of finger tapping videos to extract temporal, frequency, and amplitude-based features from wrist and hand movements using their formulas. These handcrafted features were then processed through our attention enhanced CNN BiLSTM model, a hybrid deep learning framework that integrates CNN, BiLSTM, and attention mechanisms to classify PD severity into multiple levels. The features first pass through a Conv1D MaxPooling block to capture local spatial dependencies, followed by processing through a BiLSTM layer to model the temporal dynamics of the motion. An attention mechanism is applied to emphasize the most informative temporal features, which are then refined by a second BiLSTM layer. The CNN derived features and attention enhanced BiLSTM outputs are concatenated, followed by dense and dropout layers, before being passed through a softmax classifier to predict the PD severity level. Our model demonstrated strong performance in distinguishing between the five severity classes, showcasing the effectiveness of combining spatial temporal representations with attention mechanisms for automated PD severity detection. This approach offers a promising non invasive tool to assist clinicians in monitoring PD progression and making informed treatment decisions.
