Table of Contents
Fetching ...

A Fast and Practical Column Generation Approach for Identifying Carcinogenic Multi-Hit Gene Combinations

Rick S. H. Willemsen, Tenindra Abeywickrama, Ramu Anandakrishnan

TL;DR

Constraint programming and mixed integer programming formulations of the Multi-Hit Cancer Driver Set Cover Problem (MHCDSCP) are presented and it is suggested that solving the MHCDSCP is less computationally intensive than previously believed.

Abstract

Cancer is often driven by specific combinations of an estimated two to nine gene mutations, known as multi-hit combinations. Identifying these combinations is critical for understanding carcinogenesis and designing targeted therapies. We formalise this challenge as the Multi-Hit Cancer Driver Set Cover Problem (MHCDSCP), a binary classification problem that selects gene combinations to maximise coverage of tumor samples while minimising coverage of normal samples. Existing approaches typically rely on exhaustive search and supercomputing infrastructure. In this paper, we present constraint programming and mixed integer programming formulations of the MHCDSCP. Evaluated on real-world cancer genomics data, our methods achieve performance comparable to state-of-the-art methods while running on a single commodity CPU in under a minute. Furthermore, we introduce a column generation heuristic capable of solving small instances to optimality. These results suggest that solving the MHCDSCP is less computationally intensive than previously believed, thereby opening research directions for exploring modelling assumptions.

A Fast and Practical Column Generation Approach for Identifying Carcinogenic Multi-Hit Gene Combinations

TL;DR

Constraint programming and mixed integer programming formulations of the Multi-Hit Cancer Driver Set Cover Problem (MHCDSCP) are presented and it is suggested that solving the MHCDSCP is less computationally intensive than previously believed.

Abstract

Cancer is often driven by specific combinations of an estimated two to nine gene mutations, known as multi-hit combinations. Identifying these combinations is critical for understanding carcinogenesis and designing targeted therapies. We formalise this challenge as the Multi-Hit Cancer Driver Set Cover Problem (MHCDSCP), a binary classification problem that selects gene combinations to maximise coverage of tumor samples while minimising coverage of normal samples. Existing approaches typically rely on exhaustive search and supercomputing infrastructure. In this paper, we present constraint programming and mixed integer programming formulations of the MHCDSCP. Evaluated on real-world cancer genomics data, our methods achieve performance comparable to state-of-the-art methods while running on a single commodity CPU in under a minute. Furthermore, we introduce a column generation heuristic capable of solving small instances to optimality. These results suggest that solving the MHCDSCP is less computationally intensive than previously believed, thereby opening research directions for exploring modelling assumptions.
Paper Structure (21 sections, 7 equations, 1 figure, 6 tables, 3 algorithms)

This paper contains 21 sections, 7 equations, 1 figure, 6 tables, 3 algorithms.

Figures (1)

  • Figure 1: An example with seven genes and five samples represented as a binary matrix. Two gene combinations with a hit size of two are selected, namely $c_1$ in blue and $c_2$ in green. Together, they cover tumor samples $t_1$ and $t_2$ (true positives) and normal sample $n_1$ (false positive).