Poisson process factorization for mutational signature analysis with genomic covariates
Alessandro Zito, Giovanni Parmigiani, Jeffrey W. Miller
TL;DR
Poisson process factorization (PPF) is introduced, which addresses the limitation of the usual approach to mutational signature analysis by employing an inhomogeneous Poisson point process model to infer mutational signatures and their activities as they vary across the genome.
Abstract
Mutational signatures are powerful summaries of the mutational processes altering the DNA of cancer cells. The usual approach to mutational signature analysis consists of decomposing the matrix of mutation counts from a sample of patients using non-negative matrix factorization (NMF). However, this ignores the heterogeneous patterns of mutation rates along the genome. In this paper, we introduce Poisson process factorization (PPF), which addresses this limitation by employing an inhomogeneous Poisson point process model to infer mutational signatures and their activities as they vary across the genome. PPF generalizes the baseline NMF model by representing a patient's exposure to each signature as a locus-specific function that depends on genomic covariates and patient-specific copy numbers via a log-linear model. This quantifies the relationships between genomic features and mutational signatures, and enables attribution of individual mutations to signatures. We develop tractable algorithms for maximum a posteriori estimation and posterior inference via Markov chain Monte Carlo. We demonstrate the method on simulated data and real data from breast cancer, using genomic covariates representing histone modifications, cell replication timing, nucleosome positioning, and DNA methylation.
