Table of Contents
Fetching ...

Unlocking Multi-Site Clinical Data: A Federated Approach to Privacy-First Child Autism Behavior Analysis

Guangyu Sun, Wenhan Wu, Zhishuai Guo, Ziteng Wang, Pegah Khosravi, Chen Chen

Abstract

Automated recognition of autistic behaviors in children is essential for early intervention and objective clinical assessment. However, the development of robust models is severely hindered by strict privacy regulations (e.g., HIPAA) and the sensitive nature of pediatric data, which prevents the centralized aggregation of clinical datasets. Furthermore, individual clinical sites often suffer from data scarcity, making it difficult to learn generalized behavior patterns or tailor models to site-specific patient distributions. To address these challenges, we observe that Federated Learning (FL) can decouple model training from raw data access, enabling multi-site collaboration while maintaining strict data residency. In this paper, we present the first study exploring Federated Learning for pose-based child autism behavior recognition. Our framework employs a two-layer privacy protection mechanism: utilizing human skeletal abstraction to remove identifiable visual information from the raw RGB videos and FL to ensure sensitive pose data remains within the clinic. This approach leverages distributed clinical data to learn generalized representations while providing the flexibility for site-specific personalization. Experimental results on the MMASD benchmark demonstrate that our framework achieves high recognition accuracy, outperforming traditional federated baselines and providing a robust, privacy-first solution for multi-site clinical analysis.

Unlocking Multi-Site Clinical Data: A Federated Approach to Privacy-First Child Autism Behavior Analysis

Abstract

Automated recognition of autistic behaviors in children is essential for early intervention and objective clinical assessment. However, the development of robust models is severely hindered by strict privacy regulations (e.g., HIPAA) and the sensitive nature of pediatric data, which prevents the centralized aggregation of clinical datasets. Furthermore, individual clinical sites often suffer from data scarcity, making it difficult to learn generalized behavior patterns or tailor models to site-specific patient distributions. To address these challenges, we observe that Federated Learning (FL) can decouple model training from raw data access, enabling multi-site collaboration while maintaining strict data residency. In this paper, we present the first study exploring Federated Learning for pose-based child autism behavior recognition. Our framework employs a two-layer privacy protection mechanism: utilizing human skeletal abstraction to remove identifiable visual information from the raw RGB videos and FL to ensure sensitive pose data remains within the clinic. This approach leverages distributed clinical data to learn generalized representations while providing the flexibility for site-specific personalization. Experimental results on the MMASD benchmark demonstrate that our framework achieves high recognition accuracy, outperforming traditional federated baselines and providing a robust, privacy-first solution for multi-site clinical analysis.

Paper Structure

This paper contains 29 sections, 3 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Overview of the proposed Two-Layer Privacy framework. The first layer achieves privacy via skeletal abstraction, filtering out raw biometric identifiers from videos. This is done at each clinical site for their local video data (a demonstration is shown for 'Clinical Site A' in this figure). The second layer employs decentralized optimization (FL) using the efficient FreqMixFormer backbone to maintain data residency. Adaptive personalization layers (FedBN, FedPer, or APFL) are evaluated to handle clinical heterogeneity by adaptively mixing global knowledge with local specialization.
  • Figure 2: Visualization of 3D skeletal data from the MMASD benchmark. (a) shows representative sequences for Robotic-assisted therapy, Rhythm-based activities, and Yoga-based poses. (b) illustrates the 3D joint structure. This abstraction preserves kinetic motion for behavior analysis while raw biometric identifiers are removed.
  • Figure 3: Performance evolution across 30 communication rounds for different clinical themes. The curves demonstrate the convergence characteristics of standard FL, parameter-wise PFL, and our adaptive personalization approach. APFL (red) consistently achieves superior stability and final recognition accuracy across all therapeutic domains.
  • Figure 4: Evolution of the adaptive mixing parameter $\alpha$ across different clinical themes. The parameter is initialized with a low value and adaptively increases, indicating that the model initially prioritizes global knowledge synthesized from the collaborative network before gradually incorporating site-specific behavioral nuances from local data.