TriDeNT: Triple Deep Network Training for Privileged Knowledge Distillation in Histopathology

Lucas Farndale; Robert Insall; Ke Yuan

TriDeNT: Triple Deep Network Training for Privileged Knowledge Distillation in Histopathology

Lucas Farndale, Robert Insall, Ke Yuan

TL;DR

The paper addresses the limitation that computational pathology models often cannot leverage privileged data that are available during training but not at inference. It introduces TriDeNT, a three-branch self-supervised framework that distills information from privileged modalities such as immunohistochemistry, spatial transcriptomics, and expert nuclei annotations into representations learned from a primary histopathology input. Across diverse datasets and tasks, TriDeNT yields substantial improvements over both unprivileged Siamese baselines and, in many cases, supervised baselines, including up to 101% gains, and demonstrates robustness to harmful privileged data and domain shifts. The approach enables learning from scarce or costly data to enhance routine inputs, with potential to uncover biologically meaningful patterns and improve generalization in computational pathology.

Abstract

Computational pathology models rarely utilise data that will not be available for inference. This means most models cannot learn from highly informative data such as additional immunohistochemical (IHC) stains and spatial transcriptomics. We present TriDeNT, a novel self-supervised method for utilising privileged data that is not available during inference to improve performance. We demonstrate the efficacy of this method for a range of different paired data including immunohistochemistry, spatial transcriptomics and expert nuclei annotations. In all settings, TriDeNT outperforms other state-of-the-art methods in downstream tasks, with observed improvements of up to 101%. Furthermore, we provide qualitative and quantitative measurements of the features learned by these models and how they differ from baselines. TriDeNT offers a novel method to distil knowledge from scarce or costly data during training, to create significantly better models for routine inputs.

TriDeNT: Triple Deep Network Training for Privileged Knowledge Distillation in Histopathology

TL;DR

Abstract

Paper Structure (46 sections, 10 equations, 31 figures, 22 tables)

This paper contains 46 sections, 10 equations, 31 figures, 22 tables.

Introduction
Methodology
Self-Supervised Learning
Knowledge Distillation
Privileged Information
TriDeNT
Objective Function
VICReg
InfoNCE
Primary and Privileged Features
Datasets and Tasks
Results
Embedding Knowledge From Privileged Image Modalities
TriDeNT Pretrained Models Outperform Supervised Models on Small Datasets
Embedding Knowledge from Additional Brightfield Images
...and 31 more sections

Figures (31)

Figure 1: A: TriDeNT architecture. TriDeNT incorporates information from privileged input data to complement a primary data source. There are two encoder/projector pairs, one for the primary input (e.g. H&E patches), and one for the secondary input (e.g. transcriptomics). The primary patches are augmented and passed to the primary encoder, followed by the projector, to output a representation. The privileged data are similarly passed to the privileged data encoder and projector. All representations are then used to calculate the self-supervised loss, which enforces invariance between representations. B: Classifier head training. Following this pre-training, the primary encoder is then used as a backbone for a downstream task, with a small classifier head appended. This is then trained in a supervised manner, requiring only a small amount of data. C: Use for downstream tasks. Finally, this trained model with a classifier head can be rolled out for use.
Figure 2: (a) Abstract description of the features which will be learned by different types of self-supervised models. The colour of the lines reflects the information being leveraged by privileged and unprivileged primary models. Features are either strongly present, weakly present, or absent in the primary and privileged data. Unprivileged Siamese models learn only features strongly present in the primary input, and are unlikely to learn any features which are only weakly present. Privileged models are likely to only learn features strongly or weakly present in both primary or privileged inputs. TriDeNT combines the benefits of both methods to learn all features strongly present in the primary data, even those absent in the privileged data, while also learning features weakly present in the primary data that are strongly present in the privileged data. (b) Schematic for the learning process of these models. Black arrows indicate the forward flow of information through the network, and dashed lines indicate the signals which are received during backpropagation. Each branch effectively acts as a supervisory signal for the other branches, backpropagating feedback on the best features to learn. The primary model in the unprivileged Siamese setting only receives supervisory feedback from the primary data, so only learn primary features. Primary models in the privileged Siamese setting only receive supervisory feedback from the privileged data, so neglect many primary features. With TriDeNT , primary models receive feedback from both data types, leading to features from both inputs being learned.
Figure 3: Difference in accuracy between TriDeNT and privileged/unprivileged Siamese training on SegPath. Values greater than zero (above the dashed line) indicate a higher accuracy for TriDeNT . (b) Results for ten evaluation tasks averaged across all eight stains. Supervised baseline is provided for comparison, bold indicates best performance for the given self-supervised loss function. Supervised comparisons are only given for patch-level tasks, as train-time patch aggregation for slide-level tasks cannot be comparably achieved. Higher values indicate better performance. For full results see Table \ref{['tab:segpath']}. Value marked $\dagger$ from farndale2024synthetic. (c) Classification training dataset size performance comparison. Models were all pretrained on SegPath and evaluated using the full test set of each dataset. Training was carried out on 100%, 50%, 20%, 10%, 5%, 1%, and 0.2% of each classifier training dataset, and averaged over all SegPath stains. Supervised comparisons are trained in the same fashion, but not averaged.
Figure 4: (a) Correlation histograms between representations and gene count arrays for mouse and human ALS-ST data. Bins are chosen using the maximum of the Sturges sturges1926choice and Freedman-Diaconis freedman1981histogram estimators. In the third histogram, zero-shot models are evaluated on genes which were not seen during training, while other models which did see those genes in training are evaluated on the same genes for comparison (of course, unprivileged models never see any genes). Comparison with models which saw these genes during training. (b) Spatial transcriptomics results for white/grey matter classification with both VICReg and InfoNCE losses. Baselines provided are 'Direct Gene Prediction', where a supervised model is trained to predict the gene counts for that patch directly and the representation is then fine-tuned on the white/grey classification task, and a standard supervised model. (c) Greater correlation strengths between gene counts and representations of TriDeNT models than unprivileged Siamese models. For each gene, the maximum absolute correlation between the TriDeNT representations for each patch and the corresponding gene counts are plotted against those for unprivileged Siamese representations, with TriDeNT almost always achieving greater correlation strength. Dashed line is the identity. Appended histograms show distribution of data. Mouse data only, see Figure \ref{['fig:counts-correlation-plot-human']} for human data, which shows a similar pattern, and Figure \ref{['fig:counts-correlation-plot-mouse']} for extended comparisons of mouse data, also including privileged Siamese and supervised results.
Figure 5: (a) Sample UMAP projections from 2048 dimensions into 2 for models trained on the SegPath CD3CD20 and $\alpha$SMA subsets, evaluated on the NCT test dataset. Points are coloured by tissue type. Note that accuracies for these tasks were i) TriDeNT : CD3CD20 0.8982, $\alpha$SMA 0.9273; Siamese (Privileged): CD3CD20 0.6625, $\alpha$SMA 0.9186; Siamese (Unprivileged): CD3CD20 0.8694, $\alpha$SMA 0.8570. (b) GradCAM heatmaps for selected images from the SegPath dataset. Evaluated with VICReg loss. Brighter colours represent greater activation strengths. For a larger selection, including heatmaps for InfoNCE models, see Figures \ref{['fig:supp-grad-mnda-vicreg']} to \ref{['fig:supp-grad-panCK-infonce']}.
...and 26 more figures

TriDeNT: Triple Deep Network Training for Privileged Knowledge Distillation in Histopathology

TL;DR

Abstract

TriDeNT: Triple Deep Network Training for Privileged Knowledge Distillation in Histopathology

Authors

TL;DR

Abstract

Table of Contents

Figures (31)