Financial Bond Similarity Search Using Representation Learning
Amin Haeri, Mahdi Ghelichi, Nishant Agrawal, David Li, Catalina Gomez Sanchez
TL;DR
Addressing bond similarity under fixed-income analytics, the paper shows that incorporating learned embeddings of high-cardinality categorical attributes yields semantically meaningful bond neighbors and improves spread-curve reconstruction when issuer data are sparse. The authors train per-feature embeddings on six categorical attributes, use cosine similarity for retrieval, and apply post-filtering before fitting Nelson–Siegel curves; evaluation via sparse-issuer augmentation demonstrates superiority over one-hot baselines and competitiveness with supervised metric learners in sparse regimes. The work provides a practical framework for risk management and peer selection in fixed income, improving robustness and interpretability in curve construction and risk assessment. It also suggests avenues for hybrid and multimodal representations to further leverage domain structure.
Abstract
Finding similar bonds remains challenging in fixed-income analytics, as numerical financial attributes often overshadow categorical non-financial ones such as issuer sector and domicile. This paper shows that these categorical attributes dominate the predictability of spread curves and proposes embedding models to capture their semantic similarities, outperforming one-hot and many other baselines. Evaluated via sparse-issuer augmentation, the approach improves risk modeling and curve construction.
