Temporal Data Meets LLM -- Explainable Financial Time Series Forecasting
Xinli Yu, Zheng Chen, Yuan Ling, Shujing Dong, Zongyi Liu, Yanbin Lu
TL;DR
This work investigates using large language models to achieve explainable financial time series forecasting by unifying cross-sequence reasoning and multi-modal signals from stock prices, company profiles, and news. It evaluates GPT-4 in zero-shot/few-shot settings and OpenLLaMA in instruction-based fine-tuning on NASDAQ-100 data, showing that LLMs can surpass traditional baselines and that chain-of-thought prompting enhances performance. The study demonstrates the feasibility of fine-tuning publicly available LLMs to provide coherent forecasts and human-readable explanations, albeit with some gaps relative to GPT-4. Overall, the results suggest a promising direction for interpretable, cross-modal financial forecasting using LLMs, with future work extending to more indices, richer data types, and larger models.
Abstract
This paper presents a novel study on harnessing Large Language Models' (LLMs) outstanding knowledge and reasoning abilities for explainable financial time series forecasting. The application of machine learning models to financial time series comes with several challenges, including the difficulty in cross-sequence reasoning and inference, the hurdle of incorporating multi-modal signals from historical news, financial knowledge graphs, etc., and the issue of interpreting and explaining the model results. In this paper, we focus on NASDAQ-100 stocks, making use of publicly accessible historical stock price data, company metadata, and historical economic/financial news. We conduct experiments to illustrate the potential of LLMs in offering a unified solution to the aforementioned challenges. Our experiments include trying zero-shot/few-shot inference with GPT-4 and instruction-based fine-tuning with a public LLM model Open LLaMA. We demonstrate our approach outperforms a few baselines, including the widely applied classic ARMA-GARCH model and a gradient-boosting tree model. Through the performance comparison results and a few examples, we find LLMs can make a well-thought decision by reasoning over information from both textual news and price time series and extracting insights, leveraging cross-sequence information, and utilizing the inherent knowledge embedded within the LLM. Additionally, we show that a publicly available LLM such as Open-LLaMA, after fine-tuning, can comprehend the instruction to generate explainable forecasts and achieve reasonable performance, albeit relatively inferior in comparison to GPT-4.
