Table of Contents
Fetching ...

A Multimodal Fusion Model Leveraging MLP Mixer and Handcrafted Features-based Deep Learning Networks for Facial Palsy Detection

Heng Yim Nicole Oo, Min Hun Lee, Jeong Hoon Lim

TL;DR

The paper tackles automatic facial palsy detection by comparing single- and multicodal deep learning approaches across RGB images, landmarks, and handcrafted features. It proposes a multimodal fusion framework combining an MLP Mixer for unstructured image data with a FFN for structured features, achieving a 96.00 F1 on LOPO evaluation. It provides a comprehensive benchmark on the YFP and CK+ datasets, showing the value of integrating diverse data modalities. The work lays groundwork for clinically usable tools and points to temporal analysis and explainability as future directions for broader clinical impact.

Abstract

Algorithmic detection of facial palsy offers the potential to improve current practices, which usually involve labor-intensive and subjective assessments by clinicians. In this paper, we present a multimodal fusion-based deep learning model that utilizes an MLP mixer-based model to process unstructured data (i.e. RGB images or images with facial line segments) and a feed-forward neural network to process structured data (i.e. facial landmark coordinates, features of facial expressions, or handcrafted features) for detecting facial palsy. We then contribute to a study to analyze the effect of different data modalities and the benefits of a multimodal fusion-based approach using videos of 20 facial palsy patients and 20 healthy subjects. Our multimodal fusion model achieved 96.00 F1, which is significantly higher than the feed-forward neural network trained on handcrafted features alone (82.80 F1) and an MLP mixer-based model trained on raw RGB images (89.00 F1).

A Multimodal Fusion Model Leveraging MLP Mixer and Handcrafted Features-based Deep Learning Networks for Facial Palsy Detection

TL;DR

The paper tackles automatic facial palsy detection by comparing single- and multicodal deep learning approaches across RGB images, landmarks, and handcrafted features. It proposes a multimodal fusion framework combining an MLP Mixer for unstructured image data with a FFN for structured features, achieving a 96.00 F1 on LOPO evaluation. It provides a comprehensive benchmark on the YFP and CK+ datasets, showing the value of integrating diverse data modalities. The work lays groundwork for clinically usable tools and points to temporal analysis and explainability as future directions for broader clinical impact.

Abstract

Algorithmic detection of facial palsy offers the potential to improve current practices, which usually involve labor-intensive and subjective assessments by clinicians. In this paper, we present a multimodal fusion-based deep learning model that utilizes an MLP mixer-based model to process unstructured data (i.e. RGB images or images with facial line segments) and a feed-forward neural network to process structured data (i.e. facial landmark coordinates, features of facial expressions, or handcrafted features) for detecting facial palsy. We then contribute to a study to analyze the effect of different data modalities and the benefits of a multimodal fusion-based approach using videos of 20 facial palsy patients and 20 healthy subjects. Our multimodal fusion model achieved 96.00 F1, which is significantly higher than the feed-forward neural network trained on handcrafted features alone (82.80 F1) and an MLP mixer-based model trained on raw RGB images (89.00 F1).

Paper Structure

This paper contains 21 sections, 1 equation, 2 figures, 1 table.

Figures (2)

  • Figure 1: Our early fusion model integrates any structured data embedding from a feedforward neural network with any unstructured data embedding from any image-based model to detect a patient with facial palsy
  • Figure 2: (a) Raw RGB image, (b) Subset of 478 2-dimensional coordinates of eyes, nose, and mouth regions overlaid on the RGB image, (c) Handcrafted Manual Features, and (d) Black and White (BnW) Line Segments of Facial Contours