Table of Contents
Fetching ...

Accelerating Benchmarking of Functional Connectivity Modeling via Structure-aware Core-set Selection

Ling Zhan, Zhen Li, Junjie Huang, Tao Jia

TL;DR

This work tackles the computational bottleneck of benchmarking hundreds of FC operators by reframing it as ranking-preserving core-set selection. It introduces Structure-aware Contrastive Learning for Core-set Selection (SCLCS), built on an adaptive multi-head Transformer encoder to learn sample-specific FC structures, the Structure Perturbation Score ($SPS$) to identify structurally stable samples, and a density-balanced sampling strategy to ensure diversity. The authors prove a universal approximation property for the adaptive attention mechanism and demonstrate improved ranking preservation on the REST-meta-MDD dataset, achieving near ground-truth SPI rankings with only 10% of the data. This approach makes large-scale FC operator benchmarking practical and reproducible, potentially accelerating pre-analysis model selection in computational neuroscience.

Abstract

Benchmarking the hundreds of functional connectivity (FC) modeling methods on large-scale fMRI datasets is critical for reproducible neuroscience. However, the combinatorial explosion of model-data pairings makes exhaustive evaluation computationally prohibitive, preventing such assessments from becoming a routine pre-analysis step. To break this bottleneck, we reframe the challenge of FC benchmarking by selecting a small, representative core-set whose sole purpose is to preserve the relative performance ranking of FC operators. We formalize this as a ranking-preserving subset selection problem and propose Structure-aware Contrastive Learning for Core-set Selection (SCLCS), a self-supervised framework to select these core-sets. SCLCS first uses an adaptive Transformer to learn each sample's unique FC structure. It then introduces a novel Structural Perturbation Score (SPS) to quantify the stability of these learned structures during training, identifying samples that represent foundational connectivity archetypes. Finally, while SCLCS identifies stable samples via a top-k ranking, we further introduce a density-balanced sampling strategy as a necessary correction to promote diversity, ensuring the final core-set is both structurally robust and distributionally representative. On the large-scale REST-meta-MDD dataset, SCLCS preserves the ground-truth model ranking with just 10% of the data, outperforming state-of-the-art (SOTA) core-set selection methods by up to 23.2% in ranking consistency (nDCG@k). To our knowledge, this is the first work to formalize core-set selection for FC operator benchmarking, thereby making large-scale operators comparisons a feasible and integral part of computational neuroscience. Code is publicly available on https://github.com/lzhan94swu/SCLCS

Accelerating Benchmarking of Functional Connectivity Modeling via Structure-aware Core-set Selection

TL;DR

This work tackles the computational bottleneck of benchmarking hundreds of FC operators by reframing it as ranking-preserving core-set selection. It introduces Structure-aware Contrastive Learning for Core-set Selection (SCLCS), built on an adaptive multi-head Transformer encoder to learn sample-specific FC structures, the Structure Perturbation Score () to identify structurally stable samples, and a density-balanced sampling strategy to ensure diversity. The authors prove a universal approximation property for the adaptive attention mechanism and demonstrate improved ranking preservation on the REST-meta-MDD dataset, achieving near ground-truth SPI rankings with only 10% of the data. This approach makes large-scale FC operator benchmarking practical and reproducible, potentially accelerating pre-analysis model selection in computational neuroscience.

Abstract

Benchmarking the hundreds of functional connectivity (FC) modeling methods on large-scale fMRI datasets is critical for reproducible neuroscience. However, the combinatorial explosion of model-data pairings makes exhaustive evaluation computationally prohibitive, preventing such assessments from becoming a routine pre-analysis step. To break this bottleneck, we reframe the challenge of FC benchmarking by selecting a small, representative core-set whose sole purpose is to preserve the relative performance ranking of FC operators. We formalize this as a ranking-preserving subset selection problem and propose Structure-aware Contrastive Learning for Core-set Selection (SCLCS), a self-supervised framework to select these core-sets. SCLCS first uses an adaptive Transformer to learn each sample's unique FC structure. It then introduces a novel Structural Perturbation Score (SPS) to quantify the stability of these learned structures during training, identifying samples that represent foundational connectivity archetypes. Finally, while SCLCS identifies stable samples via a top-k ranking, we further introduce a density-balanced sampling strategy as a necessary correction to promote diversity, ensuring the final core-set is both structurally robust and distributionally representative. On the large-scale REST-meta-MDD dataset, SCLCS preserves the ground-truth model ranking with just 10% of the data, outperforming state-of-the-art (SOTA) core-set selection methods by up to 23.2% in ranking consistency (nDCG@k). To our knowledge, this is the first work to formalize core-set selection for FC operator benchmarking, thereby making large-scale operators comparisons a feasible and integral part of computational neuroscience. Code is publicly available on https://github.com/lzhan94swu/SCLCS
Paper Structure (63 sections, 8 theorems, 47 equations, 8 figures, 14 tables)

This paper contains 63 sections, 8 theorems, 47 equations, 8 figures, 14 tables.

Key Result

Theorem 1

Let $\{\mathbf{A}_h\}_{h=1}^H$ be row-stochastic attention matrices. Assume disjoint structural masks: for each row $i$ there exist pairwise-disjoint sets $\{S_h^{(i)}\}_{h=1}^H$ such that $\mathbf{A}_h(i,j)=0$ for all $j\notin S_h^{(i)}$. Let $\bar{\mathbf{A}}:=\tfrac{1}{H}\sum_{h=1}^H \mathbf{A}_h In particular, if $H\ge2$, naive averaging expands support beyond any single head's mask and inflat

Figures (8)

  • Figure 1: Overview of the SCLCS framework for ranking-preserving core-set selection. Contrasting with selection for single-model classification (top left), our task is to preserve the performance ranking of SPIs (top right). Our method (bottom) achieves this using a Transformer to learn structures, our novel SPS metric to ensure stability, and a density-aware strategy to promote diversity.
  • Figure 2: Sample coverage balance on subjects and MDD/HC of baselines.
  • Figure 3: The evolution of the learned attention map $\textbf{A}_{(\textbf{X})}^e$ across training epochs.
  • Figure A1: Time consumption of different SPIs on a single sample.
  • Figure A2: Rank comparison on brain fingerprinting using rank/density-based sampling strategies.
  • ...and 3 more figures

Theorems & Definitions (15)

  • Theorem 1: Interference of Averaged Attention
  • Theorem 2: Universal Approximation of Continuous Stochastic SPIs
  • Proposition 1: Mixture-driven perturbation magnitude
  • Theorem 3: Persistent bias of top-$k$ selection
  • Theorem : Interference of Averaged Attention, full version
  • proof
  • proof
  • proof
  • Lemma 1: Consistency of SPS
  • proof
  • ...and 5 more