Table of Contents
Fetching ...

Efficient-Husformer: Efficient Multimodal Transformer Hyperparameter Optimization for Stress and Cognitive Loads

Merey Orazaly, Fariza Temirkhanova, Jurn-Gyu Park

TL;DR

Aims to enable real-time, resource-efficient stress and cognitive load classification from multimodal physiological signals. Introduces Efficient-Husformer, a decoupled Transformer architecture whose cross-modal and self-attention components are jointly optimized under a constrained hyperparameter search over $L$, $H$, $d_m$, and $FFN$. Ablation studies and experiments on WESAD and CogLoad show that shallow, low-parameter configurations (e.g., 1-layer with small $d_m$ and FFN) yield high accuracy while dramatically reducing parameters (≈30k) and training time, outperforming the original Husformer by up to 13.83% in accuracy. The work demonstrates that lightweight, deployable Transformers can maintain strong performance in multimodal physiological sensing, with implications for wearables and edge devices.

Abstract

Transformer-based models have gained considerable attention in the field of physiological signal analysis. They leverage long-range dependencies and complex patterns in temporal signals, allowing them to achieve performance superior to traditional RNN and CNN models. However, they require high computational intensity and memory demands. In this work, we present Efficient-Husformer, a novel Transformer-based architecture developed with hyperparameter optimization (HPO) for multi-class stress detection across two multimodal physiological datasets (WESAD and CogLoad). The main contributions of this work are: (1) the design of a structured search space, targeting effective hyperparameter optimization; (2) a comprehensive ablation study evaluating the impact of architectural decisions; (3) consistent performance improvements over the original Husformer, with the best configuration achieving an accuracy of 88.41 and 92.61 (improvements of 13.83% and 6.98%) on WESAD and CogLoad datasets, respectively. The best-performing configuration is achieved with the (L + dm) or (L + FFN) modality combinations, using a single layer, 3 attention heads, a model dimension of 18/30, and FFN dimension of 120/30, resulting in a compact model with only about 30k parameters.

Efficient-Husformer: Efficient Multimodal Transformer Hyperparameter Optimization for Stress and Cognitive Loads

TL;DR

Aims to enable real-time, resource-efficient stress and cognitive load classification from multimodal physiological signals. Introduces Efficient-Husformer, a decoupled Transformer architecture whose cross-modal and self-attention components are jointly optimized under a constrained hyperparameter search over , , , and . Ablation studies and experiments on WESAD and CogLoad show that shallow, low-parameter configurations (e.g., 1-layer with small and FFN) yield high accuracy while dramatically reducing parameters (≈30k) and training time, outperforming the original Husformer by up to 13.83% in accuracy. The work demonstrates that lightweight, deployable Transformers can maintain strong performance in multimodal physiological sensing, with implications for wearables and edge devices.

Abstract

Transformer-based models have gained considerable attention in the field of physiological signal analysis. They leverage long-range dependencies and complex patterns in temporal signals, allowing them to achieve performance superior to traditional RNN and CNN models. However, they require high computational intensity and memory demands. In this work, we present Efficient-Husformer, a novel Transformer-based architecture developed with hyperparameter optimization (HPO) for multi-class stress detection across two multimodal physiological datasets (WESAD and CogLoad). The main contributions of this work are: (1) the design of a structured search space, targeting effective hyperparameter optimization; (2) a comprehensive ablation study evaluating the impact of architectural decisions; (3) consistent performance improvements over the original Husformer, with the best configuration achieving an accuracy of 88.41 and 92.61 (improvements of 13.83% and 6.98%) on WESAD and CogLoad datasets, respectively. The best-performing configuration is achieved with the (L + dm) or (L + FFN) modality combinations, using a single layer, 3 attention heads, a model dimension of 18/30, and FFN dimension of 120/30, resulting in a compact model with only about 30k parameters.

Paper Structure

This paper contains 28 sections, 14 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Efficient-Husformer Architecture (Deployed from husformer and Modified).
  • Figure 2: Motivating Example: Comparison of Original ($L$=5) and an Efficient layer ($L=3$) in the other default configurations: Heads ($H=3$), Dimension size ($d\_m=30$), and Feed-Forward Network size ($FFN=120$)
  • Figure 3: Methodology Overview
  • Figure 4: Comparison of Training and Validation Errors during Training between 1-layer (left) and 5-layers (right) on WESAD (top) and CogLoad (bottom) datasets