Table of Contents
Fetching ...

Re-analysis of the Human Transcription Factor Atlas Recovers TF-Specific Signatures from Pooled Single-Cell Screens with Missing Controls

Arka Jain, Umesh Sharma

Abstract

Public pooled single-cell perturbation atlases are valuable resources for studying transcription factor (TF) function, but downstream re-analysis can be limited by incomplete deposited metadata and missing internal controls. Here we re-analyze the human TF Atlas dataset (GSE216481), a MORF-based pooled overexpression screen spanning 3,550 TF open reading frames and 254,519 cells, with a reproducible pipeline for quality control, MORF barcode demultiplexing, per-TF differential expression, and functional enrichment. From 77,018 cells in the pooled screen, we assign 60,997 (79.2\%) to 87 TF identities. Because the deposited barcode mapping lacks the GFP and mCherry negative controls present in the original library, we use embryoid body (EB) cells as an external baseline and remove shared batch/transduction artifacts by background subtraction. This strategy recovers TF-specific signatures for 59 of 61 testable TFs, compared with 27 detected by one-vs-rest alone, showing that robust TF-level signal can be rescued despite missing intra-pool controls. HOPX, MAZ, PAX6, FOS, and FEZF2 emerge as the strongest transcriptional remodelers, while per-TF enrichment links FEZF2 to regulation of differentiation, EGR1 to Hippo and cardiac programs, FOS to focal adhesion, and NFIC to collagen biosynthesis. Condition-level analyses reveal convergent Wnt, neurogenic, EMT, and Hippo signatures, and Harmony indicates minimal confounding batch effects across pooled replicates. Our per-TF effect sizes significantly agree with Joung et al.'s published rankings (Spearman $ρ= -0.316$, $p = 0.013$; negative because lower rank indicates stronger effect). Together, these results show that the deposited TF Atlas data can support validated TF-specific transcriptional and pathway analyses when paired with principled external controls, artifact removal, and reproducible computation.

Re-analysis of the Human Transcription Factor Atlas Recovers TF-Specific Signatures from Pooled Single-Cell Screens with Missing Controls

Abstract

Public pooled single-cell perturbation atlases are valuable resources for studying transcription factor (TF) function, but downstream re-analysis can be limited by incomplete deposited metadata and missing internal controls. Here we re-analyze the human TF Atlas dataset (GSE216481), a MORF-based pooled overexpression screen spanning 3,550 TF open reading frames and 254,519 cells, with a reproducible pipeline for quality control, MORF barcode demultiplexing, per-TF differential expression, and functional enrichment. From 77,018 cells in the pooled screen, we assign 60,997 (79.2\%) to 87 TF identities. Because the deposited barcode mapping lacks the GFP and mCherry negative controls present in the original library, we use embryoid body (EB) cells as an external baseline and remove shared batch/transduction artifacts by background subtraction. This strategy recovers TF-specific signatures for 59 of 61 testable TFs, compared with 27 detected by one-vs-rest alone, showing that robust TF-level signal can be rescued despite missing intra-pool controls. HOPX, MAZ, PAX6, FOS, and FEZF2 emerge as the strongest transcriptional remodelers, while per-TF enrichment links FEZF2 to regulation of differentiation, EGR1 to Hippo and cardiac programs, FOS to focal adhesion, and NFIC to collagen biosynthesis. Condition-level analyses reveal convergent Wnt, neurogenic, EMT, and Hippo signatures, and Harmony indicates minimal confounding batch effects across pooled replicates. Our per-TF effect sizes significantly agree with Joung et al.'s published rankings (Spearman , ; negative because lower rank indicates stronger effect). Together, these results show that the deposited TF Atlas data can support validated TF-specific transcriptional and pathway analyses when paired with principled external controls, artifact removal, and reproducible computation.

Paper Structure

This paper contains 18 sections, 8 figures, 1 table.

Figures (8)

  • Figure 1: UMAP of the combined TF Atlas dataset. (A) 254,519 cells colored by sample identity, showing separation between experimental conditions and overlap among pooled screen replicates. (B) Leiden clustering identifies 83 transcriptionally distinct populations.
  • Figure 2: MORF barcode demultiplexing of pooled screen samples. (A) UMAP of 77,018 cells colored by demultiplexing status: assigned (blue), ambiguous (orange), undetected (green). (B) Distribution of assigned cells across TFs.
  • Figure 3: TF-specific differentially expressed genes per TF (vs EB control with background subtraction, $|$log$_2$FC$|$$>$ 0.5, FDR $<$ 0.05). Red: upregulated; blue: downregulated. Top 30 TFs shown.
  • Figure 4: Functional enrichment of condition-level DEGs. (A) Most frequently enriched GO/KEGG terms across perturbations, ranked by minimum adjusted $p$-value. (B) Dotplot showing term enrichment across perturbations (dot size = gene count, color = $-$log$_{10}$ adjusted $p$-value).
  • Figure 5: Functional enrichment of per-TF DEGs from pooled screen. (A) Most frequently enriched GO/KEGG terms across 13 TFs with significant ORA results, ranked by minimum adjusted $p$-value. (B) Dotplot showing per-TF enrichment landscape (dot size = gene count, color = $-$log$_{10}$ adjusted $p$-value). Only TFs with $\geq$5 significant DEGs ($|$log$_2$FC$|$$>$ 0.5, FDR $<$ 0.05) were tested.
  • ...and 3 more figures