Table of Contents
Fetching ...

TractoEmbed: Modular Multi-level Embedding framework for white matter tract segmentation

Anoushkrit Goel, Bipanjit Singh, Ankita Joshi, Ranjeet Ranjan Jha, Chirag Ahuja, Aditya Nigam, Arnav Bhavsar

TL;DR

This work tackles the challenge of white matter tract segmentation in diffusion MRI by introducing TractoEmbed, a modular framework that learns rich representations from three hierarchical data views: streamline, cluster, and patch. Each view is processed by a task-specific encoder—Streamline via a Fiber Descriptor CNN, Cluster via PointNet on local/hyperlocal point clouds, and Patch via a mini-PointNet plus dVAE—to produce embeddings that are fused at the MECL and fed to a classifier. The key contributions include a novel multi-level data representation, a modular embedding architecture that can accommodate additional encoders, and demonstrated improvements over state-of-the-art methods across diverse datasets and age groups, particularly for structurally similar and minor projection fibers. This approach reduces reliance on global references and ATLAS-based parcellations, offering robust, scalable tract segmentation with potential clinical applicability in ROI-focused and time-sensitive settings.

Abstract

White matter tract segmentation is crucial for studying brain structural connectivity and neurosurgical planning. However, segmentation remains challenging due to issues like class imbalance between major and minor tracts, structural similarity, subject variability, symmetric streamlines between hemispheres etc. To address these challenges, we propose TractoEmbed, a modular multi-level embedding framework, that encodes localized representations through learning tasks in respective encoders. In this paper, TractoEmbed introduces a novel hierarchical streamline data representation that captures maximum spatial information at each level i.e. individual streamlines, clusters, and patches. Experiments show that TractoEmbed outperforms state-of-the-art methods in white matter tract segmentation across different datasets, and spanning various age groups. The modular framework directly allows the integration of additional embeddings in future works.

TractoEmbed: Modular Multi-level Embedding framework for white matter tract segmentation

TL;DR

This work tackles the challenge of white matter tract segmentation in diffusion MRI by introducing TractoEmbed, a modular framework that learns rich representations from three hierarchical data views: streamline, cluster, and patch. Each view is processed by a task-specific encoder—Streamline via a Fiber Descriptor CNN, Cluster via PointNet on local/hyperlocal point clouds, and Patch via a mini-PointNet plus dVAE—to produce embeddings that are fused at the MECL and fed to a classifier. The key contributions include a novel multi-level data representation, a modular embedding architecture that can accommodate additional encoders, and demonstrated improvements over state-of-the-art methods across diverse datasets and age groups, particularly for structurally similar and minor projection fibers. This approach reduces reliance on global references and ATLAS-based parcellations, offering robust, scalable tract segmentation with potential clinical applicability in ROI-focused and time-sensitive settings.

Abstract

White matter tract segmentation is crucial for studying brain structural connectivity and neurosurgical planning. However, segmentation remains challenging due to issues like class imbalance between major and minor tracts, structural similarity, subject variability, symmetric streamlines between hemispheres etc. To address these challenges, we propose TractoEmbed, a modular multi-level embedding framework, that encodes localized representations through learning tasks in respective encoders. In this paper, TractoEmbed introduces a novel hierarchical streamline data representation that captures maximum spatial information at each level i.e. individual streamlines, clusters, and patches. Experiments show that TractoEmbed outperforms state-of-the-art methods in white matter tract segmentation across different datasets, and spanning various age groups. The modular framework directly allows the integration of additional embeddings in future works.

Paper Structure

This paper contains 14 sections, 2 figures, 7 tables.

Figures (2)

  • Figure 1: Data Representations: Streamline, Patch, and Cluster. For (C) Streamline Data, (A)input streamline of shape (15,3), is (v) converted to a point cloud. For (B) Hyperlocal Streamlines, (refer Section \ref{['sec:pcd']}) input streamline undergoes (i) bicubic interpolation to make a streamline of shape (40,3), on which $k_{local}$ neighboring streamlines are sampled using (ii) MDF Distance. In $k_{local}$ search space, FSS st2022fast, (ii) Fast Streamline Search is used to get 5 (B) hyperlocal streamlines. (iii) get PCD converts hyperlocal streamlines to (D) Cluster Data with ($n_c$,3) points. For (E) Patch Data, (iv) $p$ farthest points are sampled using FPS (refer Section \ref{['subsec:patch-point-cloud']}), $n_p$ points in each patch using kNN to find neighboring points.
  • Figure 2: Streamline, Patch, and Cluster data obtained through the processes illustrated in Fig.: \ref{['fig:data-representation']}, are sent to respective encoders to generate embeddings. (A)Streamline Data of dimensions ($n_s$, 3) serves as input to the Fiber Descriptor zhang2020deep, producing an output of dimensions ($2*n_s$, $2*n_s$, 3), where $n_s$ is number of points per streamline, which is fed to 4 CNN blocks, refer Table \ref{['tab:sencoder']}, to obtain a final embedding of dimensions (256, 1) for the MECL (Multi Embedding Concat Layer). (B)Cluster Data$(n_c,3)$ from either local or hyperlocal point cloud, refer to section \ref{['sec:pcd']}, is fed to the PointNet Encoder to give cluster embedding of 1024 dimensions. Patches on Cluster Data are created using Farthest Point Sampling to fetch 64 patches with 16 points each, resulting in (64,16,3) dimensions. (C)Patch data is fed to a mini PointNet, which produces a (64,256) output, further input to dVAE, resulting in 64 patches each of dimension 256. This is flattened to be fed to [4096, 1024] dense layers to give an output patch embedding of 1024 dimensions. (D) These multiple embeddings are concatenated at MECL to make (256 + 1024 + 1024 = 2304) dimensional embedding. This is input to Classifier MLP resulting in a 512 dim classification embedding.