Wave-LSTM: Multi-scale analysis of somatic whole genome copy number profiles
Charles Gadd, Christopher Yau
TL;DR
The paper addresses the challenge of interpreting somatic copy number alterations that manifest across multiple genomic scales. It introduces Wave-LSTM, which uses Haar wavelet-based source separation to decompose copy number profiles into scale-specific signals, learns scale embeddings with a convolutional-LSTM, and fuses them via self-attention into a multi-scale representation. The approach yields improved insight into subclonal structure in single-cell CNA data and enhances survival prediction on simulated data and TCGA cohorts, outperforming several baselines. This work offers a generalizable framework for multi-scale genomic signal analysis with potential applicability beyond copy number profiling.
Abstract
Changes in the number of copies of certain parts of the genome, known as copy number alterations (CNAs), due to somatic mutation processes are a hallmark of many cancers. This genomic complexity is known to be associated with poorer outcomes for patients but describing its contribution in detail has been difficult. Copy number alterations can affect large regions spanning whole chromosomes or the entire genome itself but can also be localised to only small segments of the genome and no methods exist that allow this multi-scale nature to be quantified. In this paper, we address this using Wave-LSTM, a signal decomposition approach designed to capture the multi-scale structure of complex whole genome copy number profiles. Using wavelet-based source separation in combination with deep learning-based attention mechanisms. We show that Wave-LSTM can be used to derive multi-scale representations from copy number profiles which can be used to decipher sub-clonal structures from single-cell copy number data and to improve survival prediction performance from patient tumour profiles.
