Table of Contents
Fetching ...

A Multicollinearity-Aware Signal-Processing Framework for Cross-$β$ Identification via X-ray Scattering of Alzheimer's Tissue

Abdullah Al Bashit, Prakash Nepal, Lee Makowski

TL;DR

This work addresses the challenge of detecting cross-$\beta$ fibrillar ordering in in situ brain tissue from X-ray scattering data, where substrate signals and high feature correlations impede automated classification. It introduces a three-stage framework: (i) Bayes-optimal separation of mica from tissue, (ii) class-conditional multicollinearity-aware pruning to reduce redundancy while preserving discriminative structure, and (iii) a compact 1D-CNN trained on the pruned features to identify cross-$\beta$ signatures. The authors provide theoretical guarantees for pruning (Bayes-risk collapse and a gradient-based irreducibility bound under Hölder smoothness) and validate them experimentally on a dataset of 1,351 profiles, achieving a top test F1 of 84.30% with 11 features and 174 parameters. The methodology yields an interpretable, statistically grounded approach for data-limited, high-dimensional, correlated measurements and demonstrates potential applicability to other diffraction-based analyses beyond Alzheimer's tissue.

Abstract

X-ray scattering measurements of in situ human brain tissue encode structural signatures of pathological cross-$β$ inclusions, yet systematic exploitation of these data for automated detection remains challenging due to substrate contamination, strong inter-feature correlations, and limited sample sizes. This work develops a three-stage classification framework for identifying cross-$β$ structural inclusions-a hallmark of Alzheimer's disease-in X-ray scattering profiles of post-mortem human brain. Stage 1 employs a Bayes-optimal classifier to separate mica substrate from tissue regions on the basis of their distinct scattering signatures. Stage 2 introduces a multicollinearityaware, class-conditional correlation pruning scheme with formal guarantees on the induced Bayes risk and approximation error, thereby reducing redundancy while retaining class-discriminative information. Stage 3 trains a compact neural network on the pruned feature set to detect the presence or absence of cross-$β$ fibrillar ordering. The top-performing model, optimized with a composite loss combining Focal and Dice objectives, attains a test F1-score of 84.30% using 11 of 211 candidate features and 174 trainable parameters. The overall framework yields an interpretable, theory-grounded strategy for data-limited classification problems involving correlated, high-dimensional experimental measurements, exemplified here by X-ray scattering profiles of neurodegenerative tissue.

A Multicollinearity-Aware Signal-Processing Framework for Cross-$β$ Identification via X-ray Scattering of Alzheimer's Tissue

TL;DR

This work addresses the challenge of detecting cross- fibrillar ordering in in situ brain tissue from X-ray scattering data, where substrate signals and high feature correlations impede automated classification. It introduces a three-stage framework: (i) Bayes-optimal separation of mica from tissue, (ii) class-conditional multicollinearity-aware pruning to reduce redundancy while preserving discriminative structure, and (iii) a compact 1D-CNN trained on the pruned features to identify cross- signatures. The authors provide theoretical guarantees for pruning (Bayes-risk collapse and a gradient-based irreducibility bound under Hölder smoothness) and validate them experimentally on a dataset of 1,351 profiles, achieving a top test F1 of 84.30% with 11 features and 174 parameters. The methodology yields an interpretable, statistically grounded approach for data-limited, high-dimensional, correlated measurements and demonstrates potential applicability to other diffraction-based analyses beyond Alzheimer's tissue.

Abstract

X-ray scattering measurements of in situ human brain tissue encode structural signatures of pathological cross- inclusions, yet systematic exploitation of these data for automated detection remains challenging due to substrate contamination, strong inter-feature correlations, and limited sample sizes. This work develops a three-stage classification framework for identifying cross- structural inclusions-a hallmark of Alzheimer's disease-in X-ray scattering profiles of post-mortem human brain. Stage 1 employs a Bayes-optimal classifier to separate mica substrate from tissue regions on the basis of their distinct scattering signatures. Stage 2 introduces a multicollinearityaware, class-conditional correlation pruning scheme with formal guarantees on the induced Bayes risk and approximation error, thereby reducing redundancy while retaining class-discriminative information. Stage 3 trains a compact neural network on the pruned feature set to detect the presence or absence of cross- fibrillar ordering. The top-performing model, optimized with a composite loss combining Focal and Dice objectives, attains a test F1-score of 84.30% using 11 of 211 candidate features and 174 trainable parameters. The overall framework yields an interpretable, theory-grounded strategy for data-limited classification problems involving correlated, high-dimensional experimental measurements, exemplified here by X-ray scattering profiles of neurodegenerative tissue.

Paper Structure

This paper contains 30 sections, 4 theorems, 74 equations, 4 figures, 5 tables, 2 algorithms.

Key Result

Proposition 1

Let $(\Omega, \mathcal{F}, \mathbb{P})$ be a probability space, where $X : \Omega \to \mathbb{R}^d$ is a random vector and $Y : \Omega \to [C] := \{1, \dots, C\}$ is a discrete random variable. Assume there exists a unique index set $\mathcal{Q}_\tau^\star \subseteq [d] := \{1, 2, \dots, d\}$ such t

Figures (4)

  • Figure 1: Small- and wide-angle X-ray scattering (SAXS/WAXS) schematic of fixed human brain tissue containing cross-$\beta$ fibrils, mounted on a mica substrate. A microfocused X-ray beam scans the region of interest (ROI), and scattering at angle $2\theta$ is recorded by 2D SAXS and WAXS detectors, with the transmitted beam shown for reference. Rotated views of a cross-$\beta$ fibril and a molecular cartoon illustrate $\beta$-sheet stacking, linking interstrand ($q\!\approx\!1.34\ \text{\AA}^{-1}$) and intersheet ($q\!\approx\!0.6\ \text{\AA}^{-1}$) periodicities. The SAXS and WAXS detector patterns, together with the mapped scattering profile $X(q)$, highlight an interstrand peak near $q\approx 1.34\ \text{\AA}^{-1}$ and a diffuse intersheet reflection near $q\approx 0.6\ \text{\AA}^{-1}$. The shaded band ($0.4$–$1.45\ \text{\AA}^{-1}$) marks the analysis range for cross-$\beta$ detection.
  • Figure 2: Hierarchical pipeline for classifying X-ray scattering profiles. Stage 1: Bayes mica–tissue separation—red curves are tissue profiles, blue curves are mica, and the yellow surface visualizes the learned decision boundary (a subset of profiles is shown for clarity). Stage 2: Multicollinearity-aware pruning on detected tissue scattering; circles mark the selected features retained after correlation screening. Stage 3: Binary neural classifier for cross-$\beta$ detection that assigns cross-$\beta$ ($\beta$) vs. non-cross-$\beta$ ($\neg\beta$) labels.
  • Figure 3: Class-conditional densities for the summary statistic $\bar{\mu}$: \ref{['fig:gmm_sixpanel:a']}, \ref{['fig:gmm_sixpanel:c']}, and \ref{['fig:gmm_sixpanel:e']} show $p_{\text{mica}}(\bar{\mu})$ compared with $p_{\beta}(\bar{\mu})$, $p_{\neg\beta}(\bar{\mu})$, and $p_{\text{tissue}}(\bar{\mu})$, respectively; solid lines represent two-component Gaussian mixture model fits, dashed lines denote individual component densities, and histograms display empirical distributions. Shaded regions indicate false positives (blue) and false negatives (orange). The corresponding BIC curves in \ref{['fig:gmm_sixpanel:b']}, \ref{['fig:gmm_sixpanel:d']}, and \ref{['fig:gmm_sixpanel:f']} attain minima at $K=2$, supporting a two-component specification for both classes. In \ref{['fig:gmm_sixpanel:e']}, the red dashed vertical line marks the Bayes-optimal threshold $\xi^\star=0.01873$ separating mica from tissue.
  • Figure 4: Multicollinearity-aware feature selection over correlation thresholds ${\tau=\tau_{\beta}=\tau_{\neg\beta}}$ showing class-conditional mean scattering-intensity profiles for non–cross-$\beta$ ($\neg\beta$) (blue) and cross-$\beta$ ($\beta$) (red) samples, with $\pm1$ standard deviation bands (shaded), spanning 211 spectral features. Red circles mark features selected from $\beta$-specific pruning ($\mathcal{Q}_\tau^{(\beta)}$), blue circles mark features from $\neg\beta$-specific pruning ($\mathcal{Q}_\tau^{(\neg\beta)}$), and green circles indicate the combined feature subset $\widehat{\mathcal{Q}}_\tau=\mathcal{Q}_\tau^{(\neg\beta)}\cup\mathcal{Q}_\tau^{(\beta)}$. As $\tau$ increases from (a) 0.987 to (i) 0.995, more stringent correlation thresholds yield progressively larger feature subsets, growing from $|\widehat{\mathcal{Q}}_\tau|=6$ to $|\widehat{\mathcal{Q}}_\tau|=37$, demonstrating the tradeoff between multicollinearity mitigation and feature retention.

Theorems & Definitions (4)

  • Proposition 1: Information-loss risk
  • Theorem 1: Gradient-based lower bound on functional irreducibility
  • Corollary 1: Hölder-gradient curvature penalty
  • Proposition 2: Tightness in the affine–segment case