Table of Contents
Fetching ...

Private and interpretable clinical prediction with quantum-inspired tensor train models

José Ramón Pareja Monturiol, Juliette Sinnott, Roger G. Melko, Mohammad Kohandel

TL;DR

This work tackles the tension between predictive accuracy, interpretability, and privacy in clinical prediction. It introduces a quantum-inspired tensor-train (TT) tensorization as a post-training defense that obfuscates parameters while preserving performance, enabling private, interpretable predictions for both logistic regression and tensorized neural networks. Across LORIS and related datasets, TT-based obfuscation reduces white-box leakage to random guessing and achieves black-box privacy comparable to differential privacy, with a controllable privacy-utility trade-off via output discretization. Importantly, TT preserves and extends interpretability by enabling efficient marginal and conditional analyses, including cancer-type conditioned insights, and generalizes to tensorized neural networks, providing a practical pathway for private, interpretable clinical prediction.

Abstract

Machine learning in clinical settings must balance predictive accuracy, interpretability, and privacy. Models such as logistic regression (LR) offer transparency, while neural networks (NNs) provide greater predictive power; yet both remain vulnerable to privacy attacks. We empirically assess these risks by designing attacks that identify which public datasets were used to train a model under varying levels of adversarial access, applying them to LORIS, a publicly available LR model for immunotherapy response prediction, as well as to additional shallow NN models trained for the same task. Our results show that both models leak significant training-set information, with LRs proving particularly vulnerable in white-box scenarios. Moreover, we observe that common practices such as cross-validation in LRs exacerbate these risks. To mitigate these vulnerabilities, we propose a quantum-inspired defense based on tensorizing discretized models into tensor trains (TTs), which fully obfuscates parameters while preserving accuracy, reducing white-box attacks to random guessing and degrading black-box attacks comparably to Differential Privacy. TT models retain LR interpretability and extend it through efficient computation of marginal and conditional distributions, while also enabling this higher level of interpretability for NNs. Our results demonstrate that tensorization is widely applicable and establishes a practical foundation for private, interpretable, and effective clinical prediction.

Private and interpretable clinical prediction with quantum-inspired tensor train models

TL;DR

This work tackles the tension between predictive accuracy, interpretability, and privacy in clinical prediction. It introduces a quantum-inspired tensor-train (TT) tensorization as a post-training defense that obfuscates parameters while preserving performance, enabling private, interpretable predictions for both logistic regression and tensorized neural networks. Across LORIS and related datasets, TT-based obfuscation reduces white-box leakage to random guessing and achieves black-box privacy comparable to differential privacy, with a controllable privacy-utility trade-off via output discretization. Importantly, TT preserves and extends interpretability by enabling efficient marginal and conditional analyses, including cancer-type conditioned insights, and generalizes to tensorized neural networks, providing a practical pathway for private, interpretable clinical prediction.

Abstract

Machine learning in clinical settings must balance predictive accuracy, interpretability, and privacy. Models such as logistic regression (LR) offer transparency, while neural networks (NNs) provide greater predictive power; yet both remain vulnerable to privacy attacks. We empirically assess these risks by designing attacks that identify which public datasets were used to train a model under varying levels of adversarial access, applying them to LORIS, a publicly available LR model for immunotherapy response prediction, as well as to additional shallow NN models trained for the same task. Our results show that both models leak significant training-set information, with LRs proving particularly vulnerable in white-box scenarios. Moreover, we observe that common practices such as cross-validation in LRs exacerbate these risks. To mitigate these vulnerabilities, we propose a quantum-inspired defense based on tensorizing discretized models into tensor trains (TTs), which fully obfuscates parameters while preserving accuracy, reducing white-box attacks to random guessing and degrading black-box attacks comparably to Differential Privacy. TT models retain LR interpretability and extend it through efficient computation of marginal and conditional distributions, while also enabling this higher level of interpretability for NNs. Our results demonstrate that tensorization is widely applicable and establishes a practical foundation for private, interpretable, and effective clinical prediction.
Paper Structure (29 sections, 19 equations, 5 figures, 9 tables)

This paper contains 29 sections, 19 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Feature sensitivity scores from LR and TT models (TT-LR and TT-NN, with $b=6$). LR scores are coefficients, while TT scores are obtained via marginalization. All values are normalized by the maximum absolute score.
  • Figure 2: Feature sensitivities from conditioned TT-LR models ($b=6$). The legend indicates cancer type and balanced accuracy of each conditioned TT on the corresponding data.
  • Figure 3: Monotonicity plots of LR and TT-LR models with different bin sizes. Shaded regions indicate participants with unlikely (gray, response probability $<10\%$) or likely (green, response probability $>50\%$) treatment response. From left to right, the limits of these regions are (0.25, 0.71), (0.17, 0.77), and (0.22, 0.70).
  • Figure 4: Balanced accuracy distributions of vanilla models trained on Cho1, evaluated on all samples from all datasets.
  • Figure 5: Median balanced accuracies of models evaluated across all datasets. Left: models trained on a single public dataset. Right: models trained on datasets containing each given public dataset.