Table of Contents
Fetching ...

Detecting High-Potential SMEs with Heterogeneous Graph Neural Networks

Yijiashun Qi, Hanzhe Guo, Yijiazhen Qi

TL;DR

This work addresses the challenge of identifying high-potential SMEs by predicting SBIR Phase I-to-II progression using a publicly constructed heterogeneous graph over firms, research topics, and government agencies. It introduces SME-HGT, a Heterogeneous Graph Transformer that applies type-specific attention across three edge types, achieving superior AUPRC and AUROC compared with baselines while preserving temporal integrity and reproducibility. The key contributions include a public-data graph framework, a temporal evaluation protocol, and evidence that relational context signals SME potential beyond tabular features. The findings have practical implications for policymakers and early-stage investors by enabling efficient screening and targeted support within a transparent, reproducible data ecosystem.

Abstract

Small and Medium Enterprises (SMEs) constitute 99.9% of U.S. businesses and generate 44% of economic activity, yet systematically identifying high-potential SMEs remains an open challenge. We introduce SME-HGT, a Heterogeneous Graph Transformer framework that predicts which SBIR Phase I awardees will advance to Phase II funding using exclusively public data. We construct a heterogeneous graph with 32,268 company nodes, 124 research topic nodes, and 13 government agency nodes connected by approximately 99,000 edges across three semantic relation types. SME-HGT achieves an AUPRC of 0.621 0.003 on a temporally-split test set, outperforming an MLP baseline (0.590 0.002) and R-GCN (0.608 0.013) across five random seeds. At a screening depth of 100 companies, SME-HGT attains 89.6% precision with a 2.14 lift over random selection. Our temporal evaluation protocol prevents information leakage, and our reliance on public data ensures reproducibility. These results demonstrate that relational structure among firms, research topics, and funding agencies provides meaningful signal for SME potential assessment, with implications for policymakers and early-stage investors.

Detecting High-Potential SMEs with Heterogeneous Graph Neural Networks

TL;DR

This work addresses the challenge of identifying high-potential SMEs by predicting SBIR Phase I-to-II progression using a publicly constructed heterogeneous graph over firms, research topics, and government agencies. It introduces SME-HGT, a Heterogeneous Graph Transformer that applies type-specific attention across three edge types, achieving superior AUPRC and AUROC compared with baselines while preserving temporal integrity and reproducibility. The key contributions include a public-data graph framework, a temporal evaluation protocol, and evidence that relational context signals SME potential beyond tabular features. The findings have practical implications for policymakers and early-stage investors by enabling efficient screening and targeted support within a transparent, reproducible data ecosystem.

Abstract

Small and Medium Enterprises (SMEs) constitute 99.9% of U.S. businesses and generate 44% of economic activity, yet systematically identifying high-potential SMEs remains an open challenge. We introduce SME-HGT, a Heterogeneous Graph Transformer framework that predicts which SBIR Phase I awardees will advance to Phase II funding using exclusively public data. We construct a heterogeneous graph with 32,268 company nodes, 124 research topic nodes, and 13 government agency nodes connected by approximately 99,000 edges across three semantic relation types. SME-HGT achieves an AUPRC of 0.621 0.003 on a temporally-split test set, outperforming an MLP baseline (0.590 0.002) and R-GCN (0.608 0.013) across five random seeds. At a screening depth of 100 companies, SME-HGT attains 89.6% precision with a 2.14 lift over random selection. Our temporal evaluation protocol prevents information leakage, and our reliance on public data ensures reproducibility. These results demonstrate that relational structure among firms, research topics, and funding agencies provides meaningful signal for SME potential assessment, with implications for policymakers and early-stage investors.
Paper Structure (21 sections, 4 equations, 1 figure, 3 tables)

This paper contains 21 sections, 4 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: SME-HGT architecture. Input features are projected per node type to a shared dimension, then refined through three HGT layers with residual connections. Only company node embeddings are passed to the classifier.