Deep Neural Networks for Predicting Recurrence and Survival in Patients with Esophageal Cancer After Surgery
Yuhan Zheng, Jessie A Elliott, John V Reynolds, Sheraz R Markar, Bartłomiej W. Papież, ENSURE study group
TL;DR
The study addresses prognosis in esophageal cancer after curative surgery by comparing traditional CoxPH with deep learning approaches (DeepSurv and DeepHit) on a large, multicenter ENSURE dataset. Prognostic factor identification via CoxPH revealed pathologic features as strong predictors, while predictive tasks showed DeepSurv yielding the best discrimination (C-index around 0.735–0.740) with performance comparable to CoxPH; DeepHit offered limited calibration. The findings suggest that while DNNs can match conventional methods for tabular survival data, CoxPH remains a robust, interpretable baseline, and future work could boost performance by incorporating imaging-derived features, advanced hyperparameter optimization, and multi-task or graph-based methods to improve calibration and personalized risk stratification.
Abstract
Esophageal cancer is a major cause of cancer-related mortality internationally, with high recurrence rates and poor survival even among patients treated with curative-intent surgery. Investigating relevant prognostic factors and predicting prognosis can enhance post-operative clinical decision-making and potentially improve patients' outcomes. In this work, we assessed prognostic factor identification and discriminative performances of three models for Disease-Free Survival (DFS) and Overall Survival (OS) using a large multicenter international dataset from ENSURE study. We first employed Cox Proportional Hazards (CoxPH) model to assess the impact of each feature on outcomes. Subsequently, we utilised CoxPH and two deep neural network (DNN)-based models, DeepSurv and DeepHit, to predict DFS and OS. The significant prognostic factors identified by our models were consistent with clinical literature, with post-operative pathologic features showing higher significance than clinical stage features. DeepSurv and DeepHit demonstrated comparable discriminative accuracy to CoxPH, with DeepSurv slightly outperforming in both DFS and OS prediction tasks, achieving C-index of 0.735 and 0.74, respectively. While these results suggested the potential of DNNs as prognostic tools for improving predictive accuracy and providing personalised guidance with respect to risk stratification, CoxPH still remains an adequately good prediction model, with the data used in this study.
