Table of Contents
Fetching ...

Feature-level Site Leakage Reduction for Cross-Hospital Chest X-ray Transfer via Self-Supervised Learning

Ayoub Louaye Bouaziz, Lokmane Chebouba

Abstract

Cross-hospital failure in chest X-ray models is often attributed to domain shift, yet most work assumes invariance without measuring it. This paper studies how to measure site leakage directly and how that measurement changes conclusions about transfer methods. We study multi-site self-supervised learning (SSL) and feature-level adversarial site confusion for cross-hospital transfer. We pretrain a ResNet-18 on NIH and CheXpert without pathology labels. We then freeze the encoder and train a linear pneumonia classifier on NIH only, evaluating transfer to RSNA. We quantify site leakage using a post hoc linear probe that predicts acquisition site from frozen backbone features $f$ and projection features $z$. Across 3 random seeds, multi-site SSL improves RSNA AUC from 0.6736 $\pm$ 0.0148 (ImageNet initialization) to 0.7804 $\pm$ 0.0197. Adding adversarial site confusion on $f$ reduces measured leakage but does not reliably improve AUC and increases variance. On $f$, site probe accuracy drops from 0.9890 $\pm$ 0.0021 (SSL-only) to 0.8504 $\pm$ 0.0051 (CanonicalF), where chance is 0.50. On $z$, probe accuracy drops from 0.8912 $\pm$ 0.0092 to 0.7810 $\pm$ 0.0250. These results show that measuring leakage changes how transfer methods should be interpreted: multi-site SSL drives transfer, while adversarial confusion exposes the limits of invariance assumptions.

Feature-level Site Leakage Reduction for Cross-Hospital Chest X-ray Transfer via Self-Supervised Learning

Abstract

Cross-hospital failure in chest X-ray models is often attributed to domain shift, yet most work assumes invariance without measuring it. This paper studies how to measure site leakage directly and how that measurement changes conclusions about transfer methods. We study multi-site self-supervised learning (SSL) and feature-level adversarial site confusion for cross-hospital transfer. We pretrain a ResNet-18 on NIH and CheXpert without pathology labels. We then freeze the encoder and train a linear pneumonia classifier on NIH only, evaluating transfer to RSNA. We quantify site leakage using a post hoc linear probe that predicts acquisition site from frozen backbone features and projection features . Across 3 random seeds, multi-site SSL improves RSNA AUC from 0.6736 0.0148 (ImageNet initialization) to 0.7804 0.0197. Adding adversarial site confusion on reduces measured leakage but does not reliably improve AUC and increases variance. On , site probe accuracy drops from 0.9890 0.0021 (SSL-only) to 0.8504 0.0051 (CanonicalF), where chance is 0.50. On , probe accuracy drops from 0.8912 0.0092 to 0.7810 0.0250. These results show that measuring leakage changes how transfer methods should be interpreted: multi-site SSL drives transfer, while adversarial confusion exposes the limits of invariance assumptions.

Paper Structure

This paper contains 26 sections, 4 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Overview. (A) Multi-site SSL pretraining on NIH and CheXpert with balanced batches. (B) CanonicalF adds an optional site-adversarial head acting on backbone features $f$. (C) Downstream pneumonia detection trains a linear head on NIH with a frozen encoder. Site leakage is measured post hoc with logistic-regression probes on frozen $f$ and $z$ (chance = 0.50 for two sites).
  • Figure 2: Transfer performance and site leakage (mean $\pm$ std over 3 seeds). Left: AUC with a frozen encoder and a linear pneumonia head trained on NIH and selected by NIH validation. Right: site leakage probe accuracy (logistic regression on balanced NIH vs CheX embeddings) for backbone features $f$ and projection features $z$; chance = 0.50. Multi-site SSL improves transfer AUC, while CanonicalF reduces measured leakage but does not guarantee higher transfer AUC.