Table of Contents
Fetching ...

Bit-Identical Medical Deep Learning via Structured Orthogonal Initialization

Yakov Pyotr Shkolnikov

Abstract

Deep learning training is non-deterministic: identical code with different random seeds produces models that agree on aggregate metrics but disagree on individual predictions, with per-class AUC swings exceeding 20 percentage points on rare clinical classes. We present a framework for verified bit-identical training that eliminates three sources of randomness: weight initialization (via structured orthogonal basis functions), batch ordering (via golden ratio scheduling), and non-deterministic GPU operations (via architecture selection and custom autograd). The pipeline produces MD5-verified identical trained weights across independent runs. On PTB-XL ECG rhythm classification, structured initialization significantly exceeds Kaiming across two architectures (n=20; Conformer p = 0.016, Baseline p < 0.001), reducing aggregate variance by 2-3x and reducing per-class variability on rare rhythms by up to 7.5x (TRIGU range: 4.1pp vs 30.9pp under Kaiming, independently confirmed by 3-fold CV). A four-basis comparison at n=20 shows all structured orthogonal bases produce equivalent performance (Friedman p=0.48), establishing that the contribution is deterministic structured initialization itself, not any particular basis function. Cross-domain validation on seven MedMNIST benchmarks (n=20, all p > 0.14) confirms no performance penalty on standard tasks; per-class analysis on imbalanced tasks (ChestMNIST, RetinaMNIST) shows the same variance reduction on rare classes observed in ECG. Cross-dataset evaluation on three external ECG databases confirms zero-shot generalization (>0.93 AFIB AUC).

Bit-Identical Medical Deep Learning via Structured Orthogonal Initialization

Abstract

Deep learning training is non-deterministic: identical code with different random seeds produces models that agree on aggregate metrics but disagree on individual predictions, with per-class AUC swings exceeding 20 percentage points on rare clinical classes. We present a framework for verified bit-identical training that eliminates three sources of randomness: weight initialization (via structured orthogonal basis functions), batch ordering (via golden ratio scheduling), and non-deterministic GPU operations (via architecture selection and custom autograd). The pipeline produces MD5-verified identical trained weights across independent runs. On PTB-XL ECG rhythm classification, structured initialization significantly exceeds Kaiming across two architectures (n=20; Conformer p = 0.016, Baseline p < 0.001), reducing aggregate variance by 2-3x and reducing per-class variability on rare rhythms by up to 7.5x (TRIGU range: 4.1pp vs 30.9pp under Kaiming, independently confirmed by 3-fold CV). A four-basis comparison at n=20 shows all structured orthogonal bases produce equivalent performance (Friedman p=0.48), establishing that the contribution is deterministic structured initialization itself, not any particular basis function. Cross-domain validation on seven MedMNIST benchmarks (n=20, all p > 0.14) confirms no performance penalty on standard tasks; per-class analysis on imbalanced tasks (ChestMNIST, RetinaMNIST) shows the same variance reduction on rare classes observed in ECG. Cross-dataset evaluation on three external ECG databases confirms zero-shot generalization (>0.93 AFIB AUC).

Paper Structure

This paper contains 46 sections, 9 equations, 6 figures, 11 tables.

Figures (6)

  • Figure 1: Per-class performance variability (Conformer, $n\!=\!20$ seeds). (a) Kaiming initialization: SARRH ($n_\text{pos}\!=\!77$) ranges 4.2pp across seeds despite being moderately represented. SVARR ($n_\text{pos}\!=\!14$) ranges 20.2pp; TRIGU ($n_\text{pos}\!=\!2$) ranges 30.9pp. (b) Mixed-basis initialization: systematically tighter ranges. SARRH reduced from 4.2 to 2.5pp, SVARR from 20.2 to 11.5pp, TRIGU from 30.9 to 4.1pp ($7.5{\times}$). Green stars show the fully deterministic golden ratio run (zero variance by construction); golden falls within or above the seeded range for all classes. Thick horizontal lines: mean; shaded regions: $\pm$1 std; thin whiskers: full min--max range. Test-set positive counts (fold 10): SR 1674, AFIB 152, STACH 82, SARRH 77, SBRAD 64, PACE 28, SVARR 14, BIGU 8, AFLT 7, SVTAC 3, PSVT 2, TRIGU 2.
  • Figure 2: ECG Conformer architecture (1.83M parameters). The network processes 12-lead ECG input (1000 samples at 100 Hz) through a convolutional stem (3$\times$Conv1d + AvgPool, outputting 160 channels), three bottleneck stages with progressive stride-2 downsampling (3, 4, and 23 blocks respectively), a stride-2 transition convolution, and three Conformer blocks. Each Conformer block follows a macaron structure: FFN$_1$ (feed-forward network, $\times$0.5 residual) $\to$ MHSA (multi-head self-attention, 4 heads, $d\!=\!40$) $\to$ depthwise convolution ($k\!=\!5$) $\to$ FFN$_2$ ($\times$0.5 residual) $\to$ LN (layer normalization), with residual connections around each sub-layer. GAP (global average pooling) and GMP (global max pooling) are concatenated (320-d) and projected through a BN ($K\!+\!2$ information bottleneck, 14-d) before 12 independent sigmoid heads. Pill badges above each stage indicate the orthogonal basis used for weight initialization in mixed-basis mode. Color coding: light cyan = stem/transition layers, light blue = convolutional bottleneck stages, light orange = Conformer blocks, light teal = information bottleneck, light green = classification heads.
  • Figure 3: Structured orthogonal initialization. (a) DCT-II basis matrix for a Conv1d(64, 64, 5) layer: all 64 cosine basis vectors spanning from DC ($k\!=\!0$, top) to near-Nyquist ($k\!=\!63$, bottom). Each row is one deterministic filter; the structured frequency progression replaces random Kaiming weights. Alternative bases (Hadamard, Hartley, sinusoidal) use analogous constructions. (b) Convergence comparison (Conformer, $n\!=\!20$ seeds): mean $\pm$1 std bands. Mixed-basis initialization maintains higher validation AUC with lower variance from epoch 20 onward. Test-set results in Table \ref{['tab:main']}.
  • Figure 4: Per-class AUC variability on ChestMNIST (ResNet-18, single-basis 2D-DCT, $n\!=\!20$ seeds). (a) Kaiming initialization: Pneumothorax ($n_\text{pos}\!=\!1{,}089$) ranges 0.130, Emphysema ($n_\text{pos}\!=\!509$) ranges 0.141. (b) DCT initialization: systematically tighter ranges in 11 of 14 classes (mean range 0.056 vs 0.079). Pneumothorax tightens $3.7{\times}$ (0.035 vs 0.130), Emphysema $2.4{\times}$. Green stars: DCT golden deterministic run. Thick horizontal lines: mean; shaded regions: $\pm$1 std; thin whiskers: full min--max range. Compare with Fig. \ref{['fig:perclass']} for the analogous ECG pattern.
  • Figure 5: Generalization validation. (a) Three-fold cross-validation macro AUC for four configurations; Conformer combined ($0.955 \pm 0.012$) reaches the highest mean with seeded batch ordering. (b) Cross-dataset AFIB detection AUC on three external databases (AFDB, CPSC2018, Chapman-Shaoxing): both architectures exceed 0.93 transfer AUC.
  • ...and 1 more figures