Table of Contents
Fetching ...

Wave-LSTM: Multi-scale analysis of somatic whole genome copy number profiles

Charles Gadd, Christopher Yau

TL;DR

The paper addresses the challenge of interpreting somatic copy number alterations that manifest across multiple genomic scales. It introduces Wave-LSTM, which uses Haar wavelet-based source separation to decompose copy number profiles into scale-specific signals, learns scale embeddings with a convolutional-LSTM, and fuses them via self-attention into a multi-scale representation. The approach yields improved insight into subclonal structure in single-cell CNA data and enhances survival prediction on simulated data and TCGA cohorts, outperforming several baselines. This work offers a generalizable framework for multi-scale genomic signal analysis with potential applicability beyond copy number profiling.

Abstract

Changes in the number of copies of certain parts of the genome, known as copy number alterations (CNAs), due to somatic mutation processes are a hallmark of many cancers. This genomic complexity is known to be associated with poorer outcomes for patients but describing its contribution in detail has been difficult. Copy number alterations can affect large regions spanning whole chromosomes or the entire genome itself but can also be localised to only small segments of the genome and no methods exist that allow this multi-scale nature to be quantified. In this paper, we address this using Wave-LSTM, a signal decomposition approach designed to capture the multi-scale structure of complex whole genome copy number profiles. Using wavelet-based source separation in combination with deep learning-based attention mechanisms. We show that Wave-LSTM can be used to derive multi-scale representations from copy number profiles which can be used to decipher sub-clonal structures from single-cell copy number data and to improve survival prediction performance from patient tumour profiles.

Wave-LSTM: Multi-scale analysis of somatic whole genome copy number profiles

TL;DR

The paper addresses the challenge of interpreting somatic copy number alterations that manifest across multiple genomic scales. It introduces Wave-LSTM, which uses Haar wavelet-based source separation to decompose copy number profiles into scale-specific signals, learns scale embeddings with a convolutional-LSTM, and fuses them via self-attention into a multi-scale representation. The approach yields improved insight into subclonal structure in single-cell CNA data and enhances survival prediction on simulated data and TCGA cohorts, outperforming several baselines. This work offers a generalizable framework for multi-scale genomic signal analysis with potential applicability beyond copy number profiling.

Abstract

Changes in the number of copies of certain parts of the genome, known as copy number alterations (CNAs), due to somatic mutation processes are a hallmark of many cancers. This genomic complexity is known to be associated with poorer outcomes for patients but describing its contribution in detail has been difficult. Copy number alterations can affect large regions spanning whole chromosomes or the entire genome itself but can also be localised to only small segments of the genome and no methods exist that allow this multi-scale nature to be quantified. In this paper, we address this using Wave-LSTM, a signal decomposition approach designed to capture the multi-scale structure of complex whole genome copy number profiles. Using wavelet-based source separation in combination with deep learning-based attention mechanisms. We show that Wave-LSTM can be used to derive multi-scale representations from copy number profiles which can be used to decipher sub-clonal structures from single-cell copy number data and to improve survival prediction performance from patient tumour profiles.
Paper Structure (4 sections, 5 equations, 7 figures, 1 table)

This paper contains 4 sections, 5 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: An example of a somatic copy number alteration. (a) A normal cell. (b) A cancerous cell which has undergone a deletion on the paternal strand of chromosome (chr) $1$, and an insertion on the maternal and paternal strands of chromosome $2$ and $21$, respectively. (c) These cancerous mutations are quantified by the copy number.
  • Figure 2: Illustrative Example. Source-separated signals are adaptively filtered to obtain scale embeddings. These scale embeddings are then combined to obtain a single scale-attentive representation. (a) Simulated noise-free signals. As the scale increases, more pairs of classes become distinguishable (highlighted). Only the first pair of classes are distinguishable at the low-scale (top row), the second two at medium scale (middle row), whilst the final two become distinguishable only at high scales (bottom row). (b) tSNE projection of our multi-scale embedding ($\mathbf{M}$). (c) tSNE projection of our adaptively learnt scale specific embeddings ($\mathbf{m}_j$).
  • Figure 3: The Wave-LSTM Encoder. Source separation is achieved through zero-masking of the multi-resolution wavelet cascading filter bank. This is then adaptively filtered and combined to output a scale-attentive encoding.
  • Figure 4: Self-attention of the Illustrative Example ($\mathbf{A}\in\mathbb{R}^{1\times J}$), for each increasing scale. Samples are ordered by label, then permuted via spectral bi-clustering. We observe that attention is given to higher scales in the presence of transient signals such as finer-scale, focal aberrations.
  • Figure 5: Single-cell copy number scale embeddings. (a) t-SNE plots of scale embeddings ($\mathbf{m}_j\in\mathbb{R}^3$). (b) Attention ($\mathbf{A}\in\mathbb{R}^{1\times J}$), for each class and scale. (c) tSNE projection of multi-scale embedding ($\mathbf{M}\in\mathbb{R}^{1\times 3}$).
  • ...and 2 more figures