Table of Contents
Fetching ...

SYNCS: Synthetic Data and Contrastive Self-Supervised Training for Central Sulcus Segmentation

Vladyslav Zalevskyi, Kristoffer Hougaard Madsen

TL;DR

Central sulcus segmentation in adolescent cohorts remains challenging due to high morphological variability and limited labeled data. The authors propose a data-efficient pipeline combining synthetic data generation (SynthSeg) with self-supervised learning (SimCLR) and a multi-task SSL variant to learn cortex-morphology representations and adapt to new cohorts with minimal preprocessing. Results show that synthetic data can improve boundary accuracy (HD) on a different cohort, while SSL pre-training on a larger, diverse dataset enhances Dice scores after fine-tuning; the multi-task approach offers no clear gains. Together, these strategies enable robust CS segmentation and morphometry analysis across cohorts, offering a practical pathway toward scalable, preprocessing-light sulci analysis.

Abstract

Bipolar disorder (BD) and schizophrenia (SZ) are severe mental disorders with profound societal impact. Identifying risk markers early is crucial for understanding disease progression and enabling preventive measures. The Danish High Risk and Resilience Study (VIA) focuses on understanding early disease processes, particularly in children with familial high risk (FHR). Understanding structural brain changes associated with these diseases during early stages is essential for effective interventions. The central sulcus (CS) is a prominent brain landmark related to brain regions involved in motor and sensory processing. Analyzing CS morphology can provide valuable insights into neurodevelopmental abnormalities in the FHR group. However, segmenting the central sulcus (CS) presents challenges due to its variability, especially in adolescents. This study introduces two novel approaches to improve CS segmentation: synthetic data generation to model CS variability and self-supervised pre-training with multi-task learning to adapt models to new cohorts. These methods aim to enhance segmentation performance across diverse populations, eliminating the need for extensive preprocessing.

SYNCS: Synthetic Data and Contrastive Self-Supervised Training for Central Sulcus Segmentation

TL;DR

Central sulcus segmentation in adolescent cohorts remains challenging due to high morphological variability and limited labeled data. The authors propose a data-efficient pipeline combining synthetic data generation (SynthSeg) with self-supervised learning (SimCLR) and a multi-task SSL variant to learn cortex-morphology representations and adapt to new cohorts with minimal preprocessing. Results show that synthetic data can improve boundary accuracy (HD) on a different cohort, while SSL pre-training on a larger, diverse dataset enhances Dice scores after fine-tuning; the multi-task approach offers no clear gains. Together, these strategies enable robust CS segmentation and morphometry analysis across cohorts, offering a practical pathway toward scalable, preprocessing-light sulci analysis.

Abstract

Bipolar disorder (BD) and schizophrenia (SZ) are severe mental disorders with profound societal impact. Identifying risk markers early is crucial for understanding disease progression and enabling preventive measures. The Danish High Risk and Resilience Study (VIA) focuses on understanding early disease processes, particularly in children with familial high risk (FHR). Understanding structural brain changes associated with these diseases during early stages is essential for effective interventions. The central sulcus (CS) is a prominent brain landmark related to brain regions involved in motor and sensory processing. Analyzing CS morphology can provide valuable insights into neurodevelopmental abnormalities in the FHR group. However, segmenting the central sulcus (CS) presents challenges due to its variability, especially in adolescents. This study introduces two novel approaches to improve CS segmentation: synthetic data generation to model CS variability and self-supervised pre-training with multi-task learning to adapt models to new cohorts. These methods aim to enhance segmentation performance across diverse populations, eliminating the need for extensive preprocessing.
Paper Structure (33 sections, 6 equations, 16 figures)

This paper contains 33 sections, 6 equations, 16 figures.

Figures (16)

  • Figure 1: Schematic representation of the different morphological variants of the hand motor cortex observed in humans. Omega, medially asymmetric epsilon, laterally asymmetric epsilon, and null variants were observed in 88.3%, 2.9%, 7.0%, and 1.8% of the hemispheres, respectively with statistically significant sex differences. The epsilon variant was twice as frequent in men, and an interhemispheric concordance for morphologic variants was observed only for women. Courtesy of HandKnob_Variability.
  • Figure 2: A schematic representation of the framework proposed by SphericalCNNPFCS for the training of spherical CNNs for sulci segmentation. Two main contributions are the data augmentation approach (blue box), which augments training samples by deforming them through surface registration to every possible pair of other training samples while reconstructing all intermediate deformations and using them as additional samples and the context-aware training method (green box) in which spatial information of primary/secondary sulci is extrapolated to guide the segmentation of smaller and shallower tertiary sulci. Courtesy of SphericalCNNPFCS
  • Figure 3: BrainVISA pre-processing pipeline. (a) T1w structural image; (b) Skull stripping; (c) Hemisphere segmentation; (d) GM and WM segmentation; (e) CSF skeleton labelling; (f) Cerebral cortex surface reconstruction; (g) Sulci detection; (h) Sulci parcellation. Based on BV_SS_1.
  • Figure 4: Synthetic Data Generation Pipeline. First, we create a segmentation map, that contains both the tissue and sulci labels. Then we pass it through the SynthSeg generative model, which applies a series of transformations to the segmentation and creates the artificial image by sampling tissue-specific intensity values based on the tissue priors. Finally, the output of the model is the synthetic image and transformed segmentations that contains sulci and tissue labels.
  • Figure 5: SimCLR framework architecture. First, two image views are generated for each segmentation present in the batch using a synthetic data generator. These synthetic images are then passed through a U-Net encoder, which calculates a dense image representation which is further projected into a space where contrastive loss is computed using an MLP. The loss function encourages the embeddings of images from the same segmentation to be close together in the embedding space while pushing apart the embeddings of images from different segmentation maps.
  • ...and 11 more figures