Table of Contents
Fetching ...

Can-SAVE: Deploying Low-Cost and Population-Scale Cancer Screening via Survival Analysis Variables and EHR

Petr Philonenko, Vladimir Kokh, Pavel Blinov

TL;DR

The paper tackles the challenge of scalable, low-cost population-wide cancer screening by introducing Can-SAVE, a lightweight AI system that blends survival-analysis-derived features with a gradient-boosting model using only routine EHR and medical service codes. It demonstrates that survival signals improve ranking power, achieving a 0.228 AP and high ROC-AUC on a large Russian dataset, outperforming 17 baselines. In retrospective and prospective evaluations, Can-SAVE increased cancer detection rates by up to 91% and expanded population coverage by 36 percentage points without extra hardware or specialized data. The work provides a practical, reproducible framework for deploying population-scale cancer screening in real-world healthcare settings.

Abstract

Conventional medical cancer screening methods are costly, labor-intensive, and extremely difficult to scale. Although AI can improve cancer detection, most systems rely on complex or specialized medical data, making them impractical for large-scale screening. We introduce Can-SAVE, a lightweight AI system that ranks population-wide cancer risks solely based on medical history events. By integrating survival model outputs into a gradient-boosting framework, our approach detects subtle, long-term patient risk patterns - often well before clinical symptoms manifest. Can-SAVE was rigorously evaluated on a real-world dataset of 2.5 million adults spanning five Russian regions, marking the study as one of the largest and most comprehensive deployments of AI-driven cancer risk assessment. In a retrospective oncologist-supervised study over 1.9M patients, Can-SAVE achieves a 4-10x higher detection rate at identical screening volumes and an Average Precision (AP) of 0.228 vs. 0.193 for the best baseline (LoRA-tuned Qwen3-Embeddings via DeepSeek-R1 summarization). In a year-long prospective pilot (426K patients), our method almost doubled the cancer detection rate (+91%) and increased population coverage by 36% over the national screening protocol. The system demonstrates practical scalability: a city-wide population of 1 million patients can be processed in under three hours using standard hardware, enabling seamless clinical integration. This work proves that Can-SAVE achieves nationally significant cancer detection improvements while adhering to real-world public healthcare constraints, offering immediate clinical utility and a replicable framework for population-wide screening. Code for training and feature engineering is available at https://github.com/sb-ai-lab/Can-SAVE.

Can-SAVE: Deploying Low-Cost and Population-Scale Cancer Screening via Survival Analysis Variables and EHR

TL;DR

The paper tackles the challenge of scalable, low-cost population-wide cancer screening by introducing Can-SAVE, a lightweight AI system that blends survival-analysis-derived features with a gradient-boosting model using only routine EHR and medical service codes. It demonstrates that survival signals improve ranking power, achieving a 0.228 AP and high ROC-AUC on a large Russian dataset, outperforming 17 baselines. In retrospective and prospective evaluations, Can-SAVE increased cancer detection rates by up to 91% and expanded population coverage by 36 percentage points without extra hardware or specialized data. The work provides a practical, reproducible framework for deploying population-scale cancer screening in real-world healthcare settings.

Abstract

Conventional medical cancer screening methods are costly, labor-intensive, and extremely difficult to scale. Although AI can improve cancer detection, most systems rely on complex or specialized medical data, making them impractical for large-scale screening. We introduce Can-SAVE, a lightweight AI system that ranks population-wide cancer risks solely based on medical history events. By integrating survival model outputs into a gradient-boosting framework, our approach detects subtle, long-term patient risk patterns - often well before clinical symptoms manifest. Can-SAVE was rigorously evaluated on a real-world dataset of 2.5 million adults spanning five Russian regions, marking the study as one of the largest and most comprehensive deployments of AI-driven cancer risk assessment. In a retrospective oncologist-supervised study over 1.9M patients, Can-SAVE achieves a 4-10x higher detection rate at identical screening volumes and an Average Precision (AP) of 0.228 vs. 0.193 for the best baseline (LoRA-tuned Qwen3-Embeddings via DeepSeek-R1 summarization). In a year-long prospective pilot (426K patients), our method almost doubled the cancer detection rate (+91%) and increased population coverage by 36% over the national screening protocol. The system demonstrates practical scalability: a city-wide population of 1 million patients can be processed in under three hours using standard hardware, enabling seamless clinical integration. This work proves that Can-SAVE achieves nationally significant cancer detection improvements while adhering to real-world public healthcare constraints, offering immediate clinical utility and a replicable framework for population-wide screening. Code for training and feature engineering is available at https://github.com/sb-ai-lab/Can-SAVE.
Paper Structure (22 sections, 1 equation, 3 figures, 12 tables)

This paper contains 22 sections, 1 equation, 3 figures, 12 tables.

Figures (3)

  • Figure 1: Overview of Can-SAVE
  • Figure 2: Backend architecture of the Can-SAVE deployment
  • Figure 3: The fitted Kaplan-Meier estimators for males (blue), females (red), and all patients (green)