Table of Contents
Fetching ...

DCL-SE: Dynamic Curriculum Learning for Spatiotemporal Encoding of Brain Imaging

Meihua Zhou, Xinyu Tong, Jiarui Zhao, Min Cheng, Li Yang, Lei Tian, Nan Wan

TL;DR

This work tackles the challenge of extracting clinically actionable insights from high-dimensional brain imaging data under limited labeled samples. It introduces DaSE, a two-stage encoding-decoding pipeline that first uses Approximate Rank Pooling (ARP) to convert 3D MRI volumes into compact 2D dynamic representations and then applies Dynamic Curriculum Learning (DCL) guided by a Dynamic Group Mechanism (DGM) to progressively refine features from global anatomy to subtle pathology. The approach yields strong accuracy, robustness, and interpretability across classification, segmentation, and brain-age prediction tasks, outperforming many 2D/3D baselines and large foundation models in data-limited clinical settings. By bridging 3D volumetric data with efficient 2D networks and providing interpretable progressive decoding, DCL-SE demonstrates practical potential for scalable, privacy-conscious neuroimaging analysis and establishes a path for integrating lightweight, task-specific models with large-scale pretrained systems. The findings highlight the value of compact, adaptive architectures in the era of massive pretrained models, and suggest broader applicability to other medical imaging domains.

Abstract

High-dimensional neuroimaging analyses for clinical diagnosis are often constrained by compromises in spatiotemporal fidelity and by the limited adaptability of large-scale, general-purpose models. To address these challenges, we introduce Dynamic Curriculum Learning for Spatiotemporal Encoding (DCL-SE), an end-to-end framework centered on data-driven spatiotemporal encoding (DaSE). We leverage Approximate Rank Pooling (ARP) to efficiently encode three-dimensional volumetric brain data into information-rich, two-dimensional dynamic representations, and then employ a dynamic curriculum learning strategy, guided by a Dynamic Group Mechanism (DGM), to progressively train the decoder, refining feature extraction from global anatomical structures to fine pathological details. Evaluated across six publicly available datasets, including Alzheimer's disease and brain tumor classification, cerebral artery segmentation, and brain age prediction, DCL-SE consistently outperforms existing methods in accuracy, robustness, and interpretability. These findings underscore the critical importance of compact, task-specific architectures in the era of large-scale pretrained networks.

DCL-SE: Dynamic Curriculum Learning for Spatiotemporal Encoding of Brain Imaging

TL;DR

This work tackles the challenge of extracting clinically actionable insights from high-dimensional brain imaging data under limited labeled samples. It introduces DaSE, a two-stage encoding-decoding pipeline that first uses Approximate Rank Pooling (ARP) to convert 3D MRI volumes into compact 2D dynamic representations and then applies Dynamic Curriculum Learning (DCL) guided by a Dynamic Group Mechanism (DGM) to progressively refine features from global anatomy to subtle pathology. The approach yields strong accuracy, robustness, and interpretability across classification, segmentation, and brain-age prediction tasks, outperforming many 2D/3D baselines and large foundation models in data-limited clinical settings. By bridging 3D volumetric data with efficient 2D networks and providing interpretable progressive decoding, DCL-SE demonstrates practical potential for scalable, privacy-conscious neuroimaging analysis and establishes a path for integrating lightweight, task-specific models with large-scale pretrained systems. The findings highlight the value of compact, adaptive architectures in the era of massive pretrained models, and suggest broader applicability to other medical imaging domains.

Abstract

High-dimensional neuroimaging analyses for clinical diagnosis are often constrained by compromises in spatiotemporal fidelity and by the limited adaptability of large-scale, general-purpose models. To address these challenges, we introduce Dynamic Curriculum Learning for Spatiotemporal Encoding (DCL-SE), an end-to-end framework centered on data-driven spatiotemporal encoding (DaSE). We leverage Approximate Rank Pooling (ARP) to efficiently encode three-dimensional volumetric brain data into information-rich, two-dimensional dynamic representations, and then employ a dynamic curriculum learning strategy, guided by a Dynamic Group Mechanism (DGM), to progressively train the decoder, refining feature extraction from global anatomical structures to fine pathological details. Evaluated across six publicly available datasets, including Alzheimer's disease and brain tumor classification, cerebral artery segmentation, and brain age prediction, DCL-SE consistently outperforms existing methods in accuracy, robustness, and interpretability. These findings underscore the critical importance of compact, task-specific architectures in the era of large-scale pretrained networks.

Paper Structure

This paper contains 18 sections, 9 equations, 10 figures, 5 tables, 3 algorithms.

Figures (10)

  • Figure 1: Dynamic Grouping Mechanism (DGM), an important component of DCL-SE and the carrier of Dynamic Curriculum Learning (DCL), centralizes DCL by generating, weighting, and adjusting dynamic features. These features are then processed through grouped convolution to extract critical channel-wise information.
  • Figure 2: Illustration of existing multimodal image data processing and learning strategies. (a) Static 2D, 3D, and hybrid methods are widely used in brain imaging analysis but each has limitations: 2D methods are highly efficient yet lack spatial informationwoo2021comsoomro2023ima. 3D methods offer strong spatial expression but are constrained by computing power and sample size, limiting clinical practicalitynie2019imagtifa2024mult. hybrid methods integrate 2D and 3D via static fusion rules or post-processing but are prone to interpolation errors and poor generalizabilitygtifa2023comroy2015thr. (b) Cross-modal transfer, including 3D to 2D and CT to MRI conversions, suffers from core issues of spatial and semantic information loss during transfertf1tf2kora2022transfer. While large/foundational models enable such transfer, they face bottlenecks of high computational demands and privacy risks, with their practical generalization and information retention remaining unresolvedbiogptmedclip. (c) Curriculum Learning (CL) has notable limitations: most applications rely on static, manually defined difficulty classification, lacking feature-driven dynamic schedulinghatami2024investigatinglin2019segzhang2023tw. In heterogeneous brain imaging scenarios specifically, the absence of automatic complexity measurement and adaptation mechanisms hinders gradual improvement of diagnostic complexity and generalizationliu2024multtang2018muti.
  • Figure 3: (a) Data-based Spatiotemporal Encoding (DaSE) structure, which consists of two stages: ARP-based Spatiotemporal Encoding and Curriculum Semantic-based Progressive Decoding. For example, in Alzheimer's disease (AD), the former is a Data Conversion (DC) process that first converts the 3D MRI data into 2D slices via ARP to generate 2D dynamic images, which are used as inputs for the latter Curriculum Semantic (CurrSem) phase. At the CurrSem (CS) process, these 2D slices are then trained to decode multidimensionally through the Curriculum Learning strategy. Finally, the trained features are fed back to the 3D information via CurrSem to greatly reduce information loss between medical mltimodal data under the condition of effective training of the 3D data features. (b) DCL-SE architecture, featuring a distinctive "D" shape that represents its Dynamic Adjustment capabilities. The overall model design provides theoretical support for the decoding stage of DaSE. Specifically, the dynamic grouping mechanism (DGM) integrates the DCL strategy to perform progressive hierarchical feature extraction and, through enhanced depth separable convolution and linearization processing, improves training efficiency while reducing model size and computational complexity. (c) DCL-SE adaptively selects modules (LinearBottleNeck1 or LinearBottleNeck2) based on network positions (e.g., convolution grouping, downsampling layer placement).
  • Figure 4: Dynamic Curriculum Learning (DCL) runs through the whole process of S1,S4,S6,C1, the feature learning process is from simple to complex, and the complexity is evaluated and feedback is given at each stage. Feature fusion on (Cur.) intersection and merger sets, weighted merging of features from different stages to form a multidimensional important and comprehensive information representation through global attention approach.
  • Figure 5: The generation process of GPConv from the core part of DGM, where operator 1 denotes elementwise product,operate 2 denotes a Kronecker product from left to right.
  • ...and 5 more figures