Contrastive Similarity Learning for Market Forecasting: The ContraSim Framework
Nicholas Vinden, Raeid Saqur, Zining Zhu, Frank Rudzicz
TL;DR
ContraSim presents a self-supervised framework that learns a semantically structured embedding space for daily financial headlines by generating augmented DNS with a continuous similarity score and training with Weighted Self-Supervised Contrastive Learning. The approach enables inter-day comparisons to find historical analogs and improves market-movement forecasting when combined with LLM-based representations. Empirical results show meaningful gains on NIFTY-SFT and IMDB datasets, along with improved information-density metrics indicating that the embedding space captures market-dynamics signals without using ground-truth labels for clustering. The work advances interpretable, semantically grounded text representations for financial forecasting, with potential applicability to multiple domains and real-time decision support for analysts.
Abstract
We introduce the Contrastive Similarity Space Embedding Algorithm (ContraSim), a novel framework for uncovering the global semantic relationships between daily financial headlines and market movements. ContraSim operates in two key stages: (I) Weighted Headline Augmentation, which generates augmented financial headlines along with a semantic fine-grained similarity score, and (II) Weighted Self-Supervised Contrastive Learning (WSSCL), an extended version of classical self-supervised contrastive learning that uses the similarity metric to create a refined weighted embedding space. This embedding space clusters semantically similar headlines together, facilitating deeper market insights. Empirical results demonstrate that integrating ContraSim features into financial forecasting tasks improves classification accuracy from WSJ headlines by 7%. Moreover, leveraging an information density analysis, we find that the similarity spaces constructed by ContraSim intrinsically cluster days with homogeneous market movement directions, indicating that ContraSim captures market dynamics independent of ground truth labels. Additionally, ContraSim enables the identification of historical news days that closely resemble the headlines of the current day, providing analysts with actionable insights to predict market trends by referencing analogous past events.
