A Multicollinearity-Aware Signal-Processing Framework for Cross-$β$ Identification via X-ray Scattering of Alzheimer's Tissue
Abdullah Al Bashit, Prakash Nepal, Lee Makowski
TL;DR
This work addresses the challenge of detecting cross-$\beta$ fibrillar ordering in in situ brain tissue from X-ray scattering data, where substrate signals and high feature correlations impede automated classification. It introduces a three-stage framework: (i) Bayes-optimal separation of mica from tissue, (ii) class-conditional multicollinearity-aware pruning to reduce redundancy while preserving discriminative structure, and (iii) a compact 1D-CNN trained on the pruned features to identify cross-$\beta$ signatures. The authors provide theoretical guarantees for pruning (Bayes-risk collapse and a gradient-based irreducibility bound under Hölder smoothness) and validate them experimentally on a dataset of 1,351 profiles, achieving a top test F1 of 84.30% with 11 features and 174 parameters. The methodology yields an interpretable, statistically grounded approach for data-limited, high-dimensional, correlated measurements and demonstrates potential applicability to other diffraction-based analyses beyond Alzheimer's tissue.
Abstract
X-ray scattering measurements of in situ human brain tissue encode structural signatures of pathological cross-$β$ inclusions, yet systematic exploitation of these data for automated detection remains challenging due to substrate contamination, strong inter-feature correlations, and limited sample sizes. This work develops a three-stage classification framework for identifying cross-$β$ structural inclusions-a hallmark of Alzheimer's disease-in X-ray scattering profiles of post-mortem human brain. Stage 1 employs a Bayes-optimal classifier to separate mica substrate from tissue regions on the basis of their distinct scattering signatures. Stage 2 introduces a multicollinearityaware, class-conditional correlation pruning scheme with formal guarantees on the induced Bayes risk and approximation error, thereby reducing redundancy while retaining class-discriminative information. Stage 3 trains a compact neural network on the pruned feature set to detect the presence or absence of cross-$β$ fibrillar ordering. The top-performing model, optimized with a composite loss combining Focal and Dice objectives, attains a test F1-score of 84.30% using 11 of 211 candidate features and 174 trainable parameters. The overall framework yields an interpretable, theory-grounded strategy for data-limited classification problems involving correlated, high-dimensional experimental measurements, exemplified here by X-ray scattering profiles of neurodegenerative tissue.
