A Goemans-Williamson type algorithm for identifying subcohorts in clinical trials
Pratik Worah
TL;DR
This work addresses identifying homogeneous patient subcohorts by designing sparse, interpretable tests. It casts subcohort discovery as a MAX-CUT–like optimization and solves it via a semidefinite programming (SDP) relaxation with a Goemans–Williamson–style rounding, achieving an approximation factor of roughly $0.82$. The method is empirically applied to the METABRIC breast cancer dataset, revealing subcohorts with meaningful metastatic enrichment and uncovering associations between methylation changes and nuclear receptor expression, including a subcohort suggesting LXRB as a therapeutic target. The results demonstrate favorable sensitivity–specificity–sparsity trade-offs compared with PRIM and highlight potential clinical pathways for targeted interventions in breast cancer, while noting the need for clinical validation and consideration of limitations.
Abstract
We design an efficient algorithm that outputs tests for identifying predominantly homogeneous subcohorts of patients from large in-homogeneous datasets. Our theoretical contribution is a rounding technique, similar to that of Goemans and Wiliamson (1995), that approximates the optimal solution within a factor of $0.82$. As an application, we use our algorithm to trade-off sensitivity for specificity to systematically identify clinically interesting homogeneous subcohorts of patients in the RNA microarray dataset for breast cancer from Curtis et al. (2012). One such clinically interesting subcohort suggests a link between LXR over-expression and BRCA2 and MSH6 methylation levels for patients in that subcohort.
