Table of Contents
Fetching ...

HCFSLN: Adaptive Hyperbolic Few-Shot Learning for Multimodal Anxiety Detection

Aditya Sneh, Nilesh Kumar Sahu, Anushka Sanjay Shelke, Arya Adyasha, Haroon R. Lone

TL;DR

This work tackles anxiety detection under data scarcity by proposing HCFSLN, a multimodal few-shot learning framework that uses adaptive hyperbolic embeddings, cross-modal attention, and an adaptive gating network to fuse audio, video, and physiological signals. Embeddings are projected into a trainable hyperbolic space and classified via prototype-based distance in the Poincaré ball, guided by a combined Hyperbolic Prototypical Loss and Angular Margin Loss. The authors introduce the Multi-Modal Anxiety Dataset (M2AD) with 108 participants and benchmark against six baselines on SAD and M2AD, achieving up to 88% accuracy in 1-shot audio and outperforming baselines by about 14%. The results demonstrate that hyperbolic geometry can better capture the complex relationships in anxiety cues, enabling robust, real-time screening on accessible devices in low-resource settings.

Abstract

Anxiety disorders impact millions globally, yet traditional diagnosis relies on clinical interviews, while machine learning models struggle with overfitting due to limited data. Large-scale data collection remains costly and time-consuming, restricting accessibility. To address this, we introduce the Hyperbolic Curvature Few-Shot Learning Network (HCFSLN), a novel Few-Shot Learning (FSL) framework for multimodal anxiety detection, integrating speech, physiological signals, and video data. HCFSLN enhances feature separability through hyperbolic embeddings, cross-modal attention, and an adaptive gating network, enabling robust classification with minimal data. We collected a multimodal anxiety dataset from 108 participants and benchmarked HCFSLN against six FSL baselines, achieving 88% accuracy, outperforming the best baseline by 14%. These results highlight the effectiveness of hyperbolic space for modeling anxiety-related speech patterns and demonstrate FSL's potential for anxiety classification.

HCFSLN: Adaptive Hyperbolic Few-Shot Learning for Multimodal Anxiety Detection

TL;DR

This work tackles anxiety detection under data scarcity by proposing HCFSLN, a multimodal few-shot learning framework that uses adaptive hyperbolic embeddings, cross-modal attention, and an adaptive gating network to fuse audio, video, and physiological signals. Embeddings are projected into a trainable hyperbolic space and classified via prototype-based distance in the Poincaré ball, guided by a combined Hyperbolic Prototypical Loss and Angular Margin Loss. The authors introduce the Multi-Modal Anxiety Dataset (M2AD) with 108 participants and benchmark against six baselines on SAD and M2AD, achieving up to 88% accuracy in 1-shot audio and outperforming baselines by about 14%. The results demonstrate that hyperbolic geometry can better capture the complex relationships in anxiety cues, enabling robust, real-time screening on accessible devices in low-resource settings.

Abstract

Anxiety disorders impact millions globally, yet traditional diagnosis relies on clinical interviews, while machine learning models struggle with overfitting due to limited data. Large-scale data collection remains costly and time-consuming, restricting accessibility. To address this, we introduce the Hyperbolic Curvature Few-Shot Learning Network (HCFSLN), a novel Few-Shot Learning (FSL) framework for multimodal anxiety detection, integrating speech, physiological signals, and video data. HCFSLN enhances feature separability through hyperbolic embeddings, cross-modal attention, and an adaptive gating network, enabling robust classification with minimal data. We collected a multimodal anxiety dataset from 108 participants and benchmarked HCFSLN against six FSL baselines, achieving 88% accuracy, outperforming the best baseline by 14%. These results highlight the effectiveness of hyperbolic space for modeling anxiety-related speech patterns and demonstrate FSL's potential for anxiety classification.

Paper Structure

This paper contains 26 sections, 14 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Experimental setup.
  • Figure 2: Architecture of the proposed HCFSLN framework.
  • Figure 3: t-SNE prototype visualizations of models.
  • Figure 4: Ablation results showing the effect of loss type, curvature, and angular weight on accuracy.