Predicting Startup-VC Fund Matches with Structural Embeddings and Temporal Investment Data
Koutarou Tamura
TL;DR
The paper tackles predicting whether a VC fund will include a given startup in its portfolio by formulating it as a fund-specific binary classification task. It introduces a multi-modal startup embedding that fuses textual, numerical, categorical, and structural signals, with structural context derived from a Node2Vec embedding of a bipartite investment graph, and models fund behavior as a sequence of past investments using an LSTM. A binary compatibility scorer predicts inclusion, trained end-to-end on historical fund–startup pairs, with ablations showing the value of structural information and robustness to unseen startups via imputation. On Japanese startup data, the approach yields a notable improvement in F1 score over a baseline and demonstrates practical robustness and potential for broader applicability with interpretability enhancements planned for future work.
Abstract
This study proposes a method for predicting startup inclusion, estimating the probability that a venture capital fund will invest in a given startup. Unlike general recommendation systems, which typically rank multiple candidates, our approach formulates the problem as a binary classification task tailored to each fund-startup pair. Each startup is represented by integrating textual, numerical, and structural features, with Node2Vec capturing network context and multihead attention enabling feature fusion. Fund investment histories are encoded as LSTM based sequences of past investees. Experiments on Japanese startup data demonstrate that the proposed method achieves higher accuracy than a static baseline. The results indicate that incorporating structural features and modeling temporal investment dynamics are effective in capturing fund-startup compatibility.
