Table of Contents
Fetching ...

Masked adversarial neural network for cell type deconvolution in spatial transcriptomics

Lin Huang, Xiaofei Liu, Shunfang Wang, Wenwen Min

TL;DR

The paper addresses accurate cell-type deconvolution in spatial transcriptomics despite the lack of single-cell resolution and cross-modality differences between scRNA-seq and ST. They introduce MACD, a masked adversarial neural network that learns real-ST features via a masked autoencoder, aligns real and simulated ST data in a unified latent space through adversarial learning, and uses supervised training on simulated labels to infer cell-type proportions. MACD achieves state-of-the-art performance on 32 simulated datasets and demonstrates robust results on two real tissues, including rare cell types, with ablation studies validating the necessity of masking and adversarial components. The approach, along with public code and datasets, offers a practical pathway to more accurate spatial cell-type mapping in disease-relevant tissues.

Abstract

Accurately determining cell type composition in disease-relevant tissues is crucial for identifying disease targets. Most existing spatial transcriptomics (ST) technologies cannot achieve single-cell resolution, making it challenging to accurately determine cell types. To address this issue, various deconvolution methods have been developed. Most of these methods use single-cell RNA sequencing (scRNA-seq) data from the same tissue as a reference to infer cell types in ST data spots. However, they often overlook the differences between scRNA-seq and ST data. To overcome this limitation, we propose a Masked Adversarial Neural Network (MACD). MACD employs adversarial learning to align real ST data with simulated ST data generated from scRNA-seq data. By mapping them into a unified latent space, it can minimize the differences between the two types of data. Additionally, MACD uses masking techniques to effectively learn the features of real ST data and mitigate noise. We evaluated MACD on 32 simulated datasets and 2 real datasets, demonstrating its accuracy in performing cell type deconvolution. All code and public datasets used in this paper are available at https://github.com/wenwenmin/MACD and https://zenodo.org/records/12804822.

Masked adversarial neural network for cell type deconvolution in spatial transcriptomics

TL;DR

The paper addresses accurate cell-type deconvolution in spatial transcriptomics despite the lack of single-cell resolution and cross-modality differences between scRNA-seq and ST. They introduce MACD, a masked adversarial neural network that learns real-ST features via a masked autoencoder, aligns real and simulated ST data in a unified latent space through adversarial learning, and uses supervised training on simulated labels to infer cell-type proportions. MACD achieves state-of-the-art performance on 32 simulated datasets and demonstrates robust results on two real tissues, including rare cell types, with ablation studies validating the necessity of masking and adversarial components. The approach, along with public code and datasets, offers a practical pathway to more accurate spatial cell-type mapping in disease-relevant tissues.

Abstract

Accurately determining cell type composition in disease-relevant tissues is crucial for identifying disease targets. Most existing spatial transcriptomics (ST) technologies cannot achieve single-cell resolution, making it challenging to accurately determine cell types. To address this issue, various deconvolution methods have been developed. Most of these methods use single-cell RNA sequencing (scRNA-seq) data from the same tissue as a reference to infer cell types in ST data spots. However, they often overlook the differences between scRNA-seq and ST data. To overcome this limitation, we propose a Masked Adversarial Neural Network (MACD). MACD employs adversarial learning to align real ST data with simulated ST data generated from scRNA-seq data. By mapping them into a unified latent space, it can minimize the differences between the two types of data. Additionally, MACD uses masking techniques to effectively learn the features of real ST data and mitigate noise. We evaluated MACD on 32 simulated datasets and 2 real datasets, demonstrating its accuracy in performing cell type deconvolution. All code and public datasets used in this paper are available at https://github.com/wenwenmin/MACD and https://zenodo.org/records/12804822.
Paper Structure (18 sections, 17 equations, 4 figures, 2 tables)

This paper contains 18 sections, 17 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The network architecture of MACD. The MACD training phase consists of two stages (A) and (B). The simulated and masked ST data are processed by a shared Encoder to produce latent variables using masked autoencoder. Adversarial learning is performed on these latent variables, where a classifier distinguishes between real and simulated ST data, and a discriminator, utilizing a Gradient Reversal Layer (GRL), is trained to obscure the differences between them. (C) The trained model is used to infer cell type of the real ST data.
  • Figure 2: Performance evaluation of MACD on 32 simulated datasets, with higher PCC, SSIM, and AS values, and lower RMSE and JS values indicating better performance. AS (Accuracy Score) is a composite metric that combines PCC, SSIM, RMSE, and JS. Green triangles represent the mean values, and the middle line represents the median.
  • Figure 3: MACD effectively analyzes both major and rare cell types in the Murine Lymph Node (MLN) dataset. (A) The first panel displays the expression levels of the marker gene Cd79a, followed by the proportions of Mature B cells (a rare cell type) estimated by MACD and other methods. (B) The first panel shows the expression levels of the marker gene Klk8, followed by the proportions of CD8 T cells (a major cell type) estimated by MACD and other methods. (C) Compares the PCC and JS between MACD and other methods for the marker gene Cd79a and the proportion of Mature B cells. (D) Compares the PCC and JS between MACD and other methods for the marker gene Klk8 and the proportion of CD8 T cells.
  • Figure 4: MACD effectively analyzes both major and rare cell types in the Human Developing Heart (HDH) dataset. (A) The first panel shows the expression levels of the marker gene MYH6 , followed by the proportions of Atrial cardiomyocytes (a rare cell type) estimated by MACD and other methods. (B) The first panel displays the expression levels of the marker gene MYH7, followed by the proportions of Ventricular cardiomyocytes (a major cell type) estimated by MACD and other methods. (C) Compares the PCC and JS between MACD and other methods for the marker gene MYH6 and the proportion of Atrial cardiomyocytes. (D) Compares the PCC and JS between MACD and other methods for the marker gene MYH7 and the proportion of Ventricular cardiomyocytes.