Table of Contents
Fetching ...

Systematic evaluation of the isolated effect of tissue environment on the transcriptome using a single-cell RNA-seq atlas dataset

Daigo Okada, Jianshen Zhu, Kan Shota, Yuuki Nishimura, Kazuya Haraguchi

TL;DR

This study introduces a novel data analysis framework, named the Combinatorial Sub-dataset Extraction for Confounding Reduction (COSER), which addresses statistical confounding by using graph theory to enumerate appropriate sub-datasets in single cells.

Abstract

Background: Understanding cellular diversity throughout the body is essential for elucidating the complex functions of biological systems. Recently, large-scale single-cell omics datasets, known as omics atlases, have become available. These atlases encompass data from diverse tissues and cell-types, providing insights into the landscape of cell-type-specific gene expression. However, the isolated effect of the tissue environment has not been thoroughly investigated. Evaluating this isolated effect is challenging due to statistical confounding with cell-type effects, arising from significant biases in the combinations of tissues and cell-types within the body. Results: This study introduces a novel data analysis framework, named the Combinatorial Sub-dataset Extraction for Confounding Reduction (COSER), which addresses statistical confounding by using graph theory to enumerate appropriate sub-datasets. COSER enables the assessment of isolated effects of discrete variables in single cells. Applying COSER to the Tabula Muris Senis single-cell transcriptome atlas, we characterized the isolated impact of tissue environments. Our findings demonstrate that some of genes are markedly affected by the tissue environment, particularly in modulating intercellular diversity in immune responses and their age-related changes. Conclusion: COSER provides a robust, general-purpose framework for evaluating the isolated effects of discrete variables from large-scale data mining. This approach reveals critical insights into the interplay between tissue environments and gene expression.

Systematic evaluation of the isolated effect of tissue environment on the transcriptome using a single-cell RNA-seq atlas dataset

TL;DR

This study introduces a novel data analysis framework, named the Combinatorial Sub-dataset Extraction for Confounding Reduction (COSER), which addresses statistical confounding by using graph theory to enumerate appropriate sub-datasets in single cells.

Abstract

Background: Understanding cellular diversity throughout the body is essential for elucidating the complex functions of biological systems. Recently, large-scale single-cell omics datasets, known as omics atlases, have become available. These atlases encompass data from diverse tissues and cell-types, providing insights into the landscape of cell-type-specific gene expression. However, the isolated effect of the tissue environment has not been thoroughly investigated. Evaluating this isolated effect is challenging due to statistical confounding with cell-type effects, arising from significant biases in the combinations of tissues and cell-types within the body. Results: This study introduces a novel data analysis framework, named the Combinatorial Sub-dataset Extraction for Confounding Reduction (COSER), which addresses statistical confounding by using graph theory to enumerate appropriate sub-datasets. COSER enables the assessment of isolated effects of discrete variables in single cells. Applying COSER to the Tabula Muris Senis single-cell transcriptome atlas, we characterized the isolated impact of tissue environments. Our findings demonstrate that some of genes are markedly affected by the tissue environment, particularly in modulating intercellular diversity in immune responses and their age-related changes. Conclusion: COSER provides a robust, general-purpose framework for evaluating the isolated effects of discrete variables from large-scale data mining. This approach reveals critical insights into the interplay between tissue environments and gene expression.
Paper Structure (25 sections, 3 theorems, 5 figures, 1 algorithm)

This paper contains 25 sections, 3 theorems, 5 figures, 1 algorithm.

Key Result

Lemma 1

For an instance ${\mathcal{I}}=({\mathsf H}=(V,{\mathcal{E}});\theta_1,\theta_2,\dots,\theta_k)$, if $S\subseteq V$ is a maximal solution to ${\mathcal{I}}$, then $S_k\cup\hat{{\mathcal{E}}}(S_k)$ is a maximal solution to ${\mathcal{J}}=({\mathsf B}_{\mathsf H};\theta_k,1)$.

Figures (5)

  • Figure 1: Bipartite graphs of tissue and cell-type combinations in the TMS dataset. Biases in the combinations of tissues and cell-types are shown, highlighting the imbalance in their representation.
  • Figure 2: Extension of the maximal biclique enumeration problem to $k$-partite hypergraphs and the COSER framework. (A) Illustration of a maximal biclique. (B) An example of extending bicliques to $k$-partite hypergraphs, where the solution [[male, female], [liver, spleen], [T cell, B cell]] ensures the presence of all eight combinations shown in the tree diagram within the dataset. (C) Graphical overview of the COSER framework. These combinations of discrete variables in dataset are represented as a $k$-partite hypergraph. The subgraphs that contain all possible combinations are identified as solutions. For each solution, a sub-dataset is created that contains only the cells corresponding to the included combinations. Independent statistical analyses are conducted on sub-datasets, and their results are integrated to derive a consensus, ensuring robust insights into the variables' isolated effects. (D) Examples of maximal bicliques in the bipartite graph of FACS dataset shown in Figure 1.
  • Figure 3: Isolated effects of the tissue environment observed throughout the body. (A) All maximal solutions for the combinations of individuals, tissues, and cell-types. BAT: brown adipose tissue, GAT: gonadal adipose tissue, MAT: mesenteric adipose tissue, SCAT: subcutaneous adipose tissue, LM: limb muscle, MG: mammary gland, MSC: mesenchymal stem cell/mesenchymal stem cell of adipose, MC: myeloid cell, BC: B cell, TC: T cell, SC: skeletal muscle satellite cell, MAC: macrophage. (B) QQ plot showing P-values for the effect of tissue and cell-type in 24 sub-datasets from the FACS dataset. Each line corresponds to a sub-dataset. Zero P-values were replaced with the minimum non-zero P-value before log transformation. (C) GO terms with the top ten enrichment scores for genes affected by the tissue environment in the FACS dataset. (D) QQ plot showing P-values for the effect of the tissue and cell-type in one sub-dataset from the FACS dataset. (E) GO terms with the top ten enrichment scores for genes affected by the tissue environment in the Droplet dataset.
  • Figure 4: DAGs representing the relative magnitude of the isolated effect of tissue environments on the expression of four transcription factors ( Fosb, Klf4, Tbx15, Wt1). The inequalities of the isolated tissue effects derived from the graphs are also included.
  • Figure 5: Comparison of age-related changes between different tissue environments. (A) The four maximal solutions found for each combination of tissue, cell-type, sex, and age. (B) Scatter plot of regression coefficient of the "Young" to gene expression level. Genes exhibiting the opposite effect of aging are colored red (X-axis is positive) or blue (Y-axis is positive). (C) Scatter plot of the "Young" regression coefficient versus the gene expression level (MAT vs. SCAT). (D) GO terms with the top ten enrichment scores for genes exhibiting the opposite effect of aging (BAT vs. SCAT).

Theorems & Definitions (3)

  • Lemma 1
  • Lemma 2
  • Theorem 1