Table of Contents
Fetching ...

FacialPulse: An Efficient RNN-based Depression Detection via Temporal Facial Landmarks

Ruiqi Wang, Jinyang Huang, Jie Zhang, Xin Liu, Xiang Zhang, Zhi Liu, Peng Zhao, Sigui Chen, Xiao Sun

TL;DR

FacialPulse tackles automatic depression detection from facial videos by addressing temporal dynamics and input redundancy through two modules: Facial Landmark Calibration Module (FLCM) and Facial Motion Modeling Module (FMMM). The FLCM stabilizes landmark trajectories using motion landmark prediction with a Pyramid Lucas–Kanade approach and Kalman-based fusion to mitigate jitter and detection errors, while FMMM uses two BiGRU-based streams to capture absolute facial positions and relative changes, merging their outputs to predict a Beck Depression Inventory-II score. The approach, using $k=64$ hidden units for the BiGRU streams, achieves superior accuracy and faster training than state-of-the-art baselines on AVEC2014 and MMDA, notably reducing MAE by $21\%$ and doubling recognition speed. These results demonstrate that robust landmark-based temporal modeling can improve depression detection in practical, resource-constrained settings. The codebase is released to facilitate reproducibility and future extensions, including the integration of additional modalities.

Abstract

Depression is a prevalent mental health disorder that significantly impacts individuals' lives and well-being. Early detection and intervention are crucial for effective treatment and management of depression. Recently, there are many end-to-end deep learning methods leveraging the facial expression features for automatic depression detection. However, most current methods overlook the temporal dynamics of facial expressions. Although very recent 3DCNN methods remedy this gap, they introduce more computational cost due to the selection of CNN-based backbones and redundant facial features. To address the above limitations, by considering the timing correlation of facial expressions, we propose a novel framework called FacialPulse, which recognizes depression with high accuracy and speed. By harnessing the bidirectional nature and proficiently addressing long-term dependencies, the Facial Motion Modeling Module (FMMM) is designed in FacialPulse to fully capture temporal features. Since the proposed FMMM has parallel processing capabilities and has the gate mechanism to mitigate gradient vanishing, this module can also significantly boost the training speed. Besides, to effectively use facial landmarks to replace original images to decrease information redundancy, a Facial Landmark Calibration Module (FLCM) is designed to eliminate facial landmark errors to further improve recognition accuracy. Extensive experiments on the AVEC2014 dataset and MMDA dataset (a depression dataset) demonstrate the superiority of FacialPulse on recognition accuracy and speed, with the average MAE (Mean Absolute Error) decreased by 21% compared to baselines, and the recognition speed increased by 100% compared to state-of-the-art methods. Codes are released at https://github.com/volatileee/FacialPulse.

FacialPulse: An Efficient RNN-based Depression Detection via Temporal Facial Landmarks

TL;DR

FacialPulse tackles automatic depression detection from facial videos by addressing temporal dynamics and input redundancy through two modules: Facial Landmark Calibration Module (FLCM) and Facial Motion Modeling Module (FMMM). The FLCM stabilizes landmark trajectories using motion landmark prediction with a Pyramid Lucas–Kanade approach and Kalman-based fusion to mitigate jitter and detection errors, while FMMM uses two BiGRU-based streams to capture absolute facial positions and relative changes, merging their outputs to predict a Beck Depression Inventory-II score. The approach, using hidden units for the BiGRU streams, achieves superior accuracy and faster training than state-of-the-art baselines on AVEC2014 and MMDA, notably reducing MAE by and doubling recognition speed. These results demonstrate that robust landmark-based temporal modeling can improve depression detection in practical, resource-constrained settings. The codebase is released to facilitate reproducibility and future extensions, including the integration of additional modalities.

Abstract

Depression is a prevalent mental health disorder that significantly impacts individuals' lives and well-being. Early detection and intervention are crucial for effective treatment and management of depression. Recently, there are many end-to-end deep learning methods leveraging the facial expression features for automatic depression detection. However, most current methods overlook the temporal dynamics of facial expressions. Although very recent 3DCNN methods remedy this gap, they introduce more computational cost due to the selection of CNN-based backbones and redundant facial features. To address the above limitations, by considering the timing correlation of facial expressions, we propose a novel framework called FacialPulse, which recognizes depression with high accuracy and speed. By harnessing the bidirectional nature and proficiently addressing long-term dependencies, the Facial Motion Modeling Module (FMMM) is designed in FacialPulse to fully capture temporal features. Since the proposed FMMM has parallel processing capabilities and has the gate mechanism to mitigate gradient vanishing, this module can also significantly boost the training speed. Besides, to effectively use facial landmarks to replace original images to decrease information redundancy, a Facial Landmark Calibration Module (FLCM) is designed to eliminate facial landmark errors to further improve recognition accuracy. Extensive experiments on the AVEC2014 dataset and MMDA dataset (a depression dataset) demonstrate the superiority of FacialPulse on recognition accuracy and speed, with the average MAE (Mean Absolute Error) decreased by 21% compared to baselines, and the recognition speed increased by 100% compared to state-of-the-art methods. Codes are released at https://github.com/volatileee/FacialPulse.
Paper Structure (21 sections, 16 equations, 7 figures, 5 tables)

This paper contains 21 sections, 16 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Change intensity of facial action units with depressed patients and normal people in the same task scenario. The vertical axis denotes the magnitude of these variations, while the horizontal axis tracks the progression of video frames.
  • Figure 2: Illustration of the overall pipeline of FacialPulse, which contains two primary modules: (a) Facial Landmark Calibration Module and (b) Facial Motion Modeling Module. The input is a video and the output is the subject's BDI-II questionnaire score.
  • Figure 3: Illustration on the movements of facial landmarks during an expression that appears stable. An Action Unit (AU) is calculated by two specific landmarks, representing different action areas. For example, AU5 depends on the 22nd and the 23rd Landmarks, while AU12 and AU15 correspond to those related to the mouth.
  • Figure 4: PLK estimates the optical flow of feature points by employing the LK algorithm on individual layers of the image pyramid. It iteratively refines the position and optical flow vector of feature points across layers, enhancing both the accuracy and stability of estimation.
  • Figure 5: The facial landmark calibration process.
  • ...and 2 more figures