Cross-Modal Temporal Fusion for Financial Market Forecasting
Yunhua Pei, John Cartlidge, Anandadeep Mandal, Daniel Gold, Enrique Marcilio, Riccardo Mazzon
TL;DR
Cross-Modal Temporal Fusion (CMTF) introduces a transformer-based framework that unifies structured market data with unstructured news and financial reports through a tensor representation and encoding scheme. The approach includes a sparsity-driven tensor interpretation module and an auto-training pipeline (Optuna-based) to rapidly converge on effective hyperparameters. Empirical results on FTSE 100 data show CMTF outperforms classical and deep baselines in next-day price direction classification, with notable gains in recall and F1. The framework emphasizes interpretability and rapid adaptation to evolving market conditions, offering a scalable solution for real-world cross-modal financial forecasting.
Abstract
Accurate forecasting in financial markets requires integrating diverse data sources, from historical prices to macroeconomic indicators and financial news. However, existing models often fail to align these modalities effectively, limiting their practical use. In this paper, we introduce a transformer-based deep learning framework, Cross-Modal Temporal Fusion (CMTF), that fuses structured and unstructured financial data for improved market prediction. The model incorporates a tensor interpretation module for feature selection and an auto-training pipeline for efficient hyperparameter tuning. Experimental results using FTSE 100 stock data demonstrate that CMTF achieves superior performance in price direction classification compared to classical and deep learning baselines. These findings suggest that our framework is an effective and scalable solution for real-world cross-modal financial forecasting tasks.
