Table of Contents
Fetching ...

Cross-Sectional Asset Retrieval via Future-Aligned Soft Contrastive Learning

Hyeongmin Lee, Chanyeol Choi, Jihoon Kwon, Yoon Kim, Alejandro Lopez-Lira, Wonbin Ahn, Yongjae Lee

TL;DR

This work reframes asset retrieval as a future-aligned task, proposing Future-Aligned Soft Contrastive Learning (FASCL) that uses a soft contrastive loss based on pairwise future return correlations to shape embeddings. A patch-based Transformer encoder maps historical windows to embeddings, and retrieval is performed via cosine similarity in the learned space. The authors establish a rigorous evaluation protocol with four metrics (Trend Consistency, Future Return Correlation, Information Coefficient, Sector Precision) and demonstrate state-of-the-art performance across 4,229 US equities against 13 baselines, with strong translation to a spread-trading downstream task. The work provides a scalable, explainable approach to asset retrieval and sets up a standardized benchmark for future research in future-aligned cross-sectional similarity.

Abstract

Asset retrieval--finding similar assets in a financial universe--is central to quantitative investment decision-making. Existing approaches define similarity through historical price patterns or sector classifications, but such backward-looking criteria provide no guarantee about future behavior. We argue that effective asset retrieval should be future-aligned: the retrieved assets should be those most likely to exhibit correlated future returns. To this end, we propose Future-Aligned Soft Contrastive Learning (FASCL), a representation learning framework whose soft contrastive loss uses pairwise future return correlations as continuous supervision targets. We further introduce an evaluation protocol designed to directly assess whether retrieved assets share similar future trajectories. Experiments on 4,229 US equities demonstrate that FASCL consistently outperforms 13 baselines across all future-behavior metrics. The source code will be available soon.

Cross-Sectional Asset Retrieval via Future-Aligned Soft Contrastive Learning

TL;DR

This work reframes asset retrieval as a future-aligned task, proposing Future-Aligned Soft Contrastive Learning (FASCL) that uses a soft contrastive loss based on pairwise future return correlations to shape embeddings. A patch-based Transformer encoder maps historical windows to embeddings, and retrieval is performed via cosine similarity in the learned space. The authors establish a rigorous evaluation protocol with four metrics (Trend Consistency, Future Return Correlation, Information Coefficient, Sector Precision) and demonstrate state-of-the-art performance across 4,229 US equities against 13 baselines, with strong translation to a spread-trading downstream task. The work provides a scalable, explainable approach to asset retrieval and sets up a standardized benchmark for future research in future-aligned cross-sectional similarity.

Abstract

Asset retrieval--finding similar assets in a financial universe--is central to quantitative investment decision-making. Existing approaches define similarity through historical price patterns or sector classifications, but such backward-looking criteria provide no guarantee about future behavior. We argue that effective asset retrieval should be future-aligned: the retrieved assets should be those most likely to exhibit correlated future returns. To this end, we propose Future-Aligned Soft Contrastive Learning (FASCL), a representation learning framework whose soft contrastive loss uses pairwise future return correlations as continuous supervision targets. We further introduce an evaluation protocol designed to directly assess whether retrieved assets share similar future trajectories. Experiments on 4,229 US equities demonstrate that FASCL consistently outperforms 13 baselines across all future-behavior metrics. The source code will be available soon.
Paper Structure (28 sections, 14 equations, 3 figures, 8 tables)

This paper contains 28 sections, 14 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: t-SNE visualization of FASCL embeddings on the test set. (a--b) Colored by 5-day future return; (c) by GICS sector; (d) by thematic ETF membership. No sector, industry, or ETF labels are used during training.
  • Figure 2: t-SNE visualization of FASCL test-set embeddings colored by future cumulative return at (a) 1-day, (b) 5-day, (c) 20-day, and (d) 60-day horizons. The spatial gradient from negative (blue) to positive (red) returns is consistent across all horizons, demonstrating that the learned representations capture multi-scale future behavioral structure.
  • Figure 3: t-SNE visualization colored by (a) GICS sector and (b) thematic ETF membership. Same-sector assets cluster together despite no sector labels being used during training. Thematic ETFs such as Semiconductor (SOXX), Banking (KBE), and Magnificent 7 form tight sub-clusters that cross traditional sector boundaries, confirming that FASCL captures fine-grained behavioral groupings beyond static classifications.