Table of Contents
Fetching ...

Chinese Stock Prediction Based on a Multi-Modal Transformer Framework: Macro-Micro Information Fusion

Lumen AI, Tengzhou No. 1 Middle School, Shihao Ji, Zihui Song, Fucheng Zhong, Jisen Jia, Zhaobo Wu, Zheyi Cao, Xu Tianhao

TL;DR

We address the challenging problem of predicting the Chinese stock market by integrating multi-modal information using a novel Multi-Modal Transformer (MMF-Trans). The framework combines four encoders—technical indicators, financial text, macro data, and event knowledge—via a dynamic gated fusion mechanism and a time-aligned Transformer with a three-stage position encoding to fuse heterogeneous data frequencies. An event-quantification component based on an event knowledge graph (Event2Vec) enables dynamic event impact assessment, supported by a convergence guarantee under Lipschitz continuity. Empirical results on CSI 300 constituents show substantial improvements in RMSE, event-response accuracy, and Sharpe ratio, with robust performance and practical deployment potential including policy-impact analyses. The work provides a theoretically grounded, practically effective approach for macro-micro information fusion in financial forecasting and opens avenues for sentiment and cross-market extension.

Abstract

This paper proposes an innovative Multi-Modal Transformer framework (MMF-Trans) designed to significantly improve the prediction accuracy of the Chinese stock market by integrating multi-source heterogeneous information including macroeconomy, micro-market, financial text, and event knowledge. The framework consists of four core modules: (1) A four-channel parallel encoder that processes technical indicators, financial text, macro data, and event knowledge graph respectively for independent feature extraction of multi-modal data; (2) A dynamic gated cross-modal fusion mechanism that adaptively learns the importance of different modalities through differentiable weight allocation for effective information integration; (3) A time-aligned mixed-frequency processing layer that uses an innovative position encoding method to effectively fuse data of different time frequencies and solves the time alignment problem of heterogeneous data; (4) A graph attention-based event impact quantification module that captures the dynamic impact of events on the market through event knowledge graph and quantifies the event impact coefficient. We introduce a hybrid-frequency Transformer and Event2Vec algorithm to effectively fuse data of different frequencies and quantify the event impact. Experimental results show that in the prediction task of CSI 300 constituent stocks, the root mean square error (RMSE) of the MMF-Trans framework is reduced by 23.7% compared to the baseline model, the event response prediction accuracy is improved by 41.2%, and the Sharpe ratio is improved by 32.6%.

Chinese Stock Prediction Based on a Multi-Modal Transformer Framework: Macro-Micro Information Fusion

TL;DR

We address the challenging problem of predicting the Chinese stock market by integrating multi-modal information using a novel Multi-Modal Transformer (MMF-Trans). The framework combines four encoders—technical indicators, financial text, macro data, and event knowledge—via a dynamic gated fusion mechanism and a time-aligned Transformer with a three-stage position encoding to fuse heterogeneous data frequencies. An event-quantification component based on an event knowledge graph (Event2Vec) enables dynamic event impact assessment, supported by a convergence guarantee under Lipschitz continuity. Empirical results on CSI 300 constituents show substantial improvements in RMSE, event-response accuracy, and Sharpe ratio, with robust performance and practical deployment potential including policy-impact analyses. The work provides a theoretically grounded, practically effective approach for macro-micro information fusion in financial forecasting and opens avenues for sentiment and cross-market extension.

Abstract

This paper proposes an innovative Multi-Modal Transformer framework (MMF-Trans) designed to significantly improve the prediction accuracy of the Chinese stock market by integrating multi-source heterogeneous information including macroeconomy, micro-market, financial text, and event knowledge. The framework consists of four core modules: (1) A four-channel parallel encoder that processes technical indicators, financial text, macro data, and event knowledge graph respectively for independent feature extraction of multi-modal data; (2) A dynamic gated cross-modal fusion mechanism that adaptively learns the importance of different modalities through differentiable weight allocation for effective information integration; (3) A time-aligned mixed-frequency processing layer that uses an innovative position encoding method to effectively fuse data of different time frequencies and solves the time alignment problem of heterogeneous data; (4) A graph attention-based event impact quantification module that captures the dynamic impact of events on the market through event knowledge graph and quantifies the event impact coefficient. We introduce a hybrid-frequency Transformer and Event2Vec algorithm to effectively fuse data of different frequencies and quantify the event impact. Experimental results show that in the prediction task of CSI 300 constituent stocks, the root mean square error (RMSE) of the MMF-Trans framework is reduced by 23.7% compared to the baseline model, the event response prediction accuracy is improved by 41.2%, and the Sharpe ratio is improved by 32.6%.

Paper Structure

This paper contains 25 sections, 10 equations, 5 tables.