Table of Contents
Fetching ...

Temporal Data Meets LLM -- Explainable Financial Time Series Forecasting

Xinli Yu, Zheng Chen, Yuan Ling, Shujing Dong, Zongyi Liu, Yanbin Lu

TL;DR

This work investigates using large language models to achieve explainable financial time series forecasting by unifying cross-sequence reasoning and multi-modal signals from stock prices, company profiles, and news. It evaluates GPT-4 in zero-shot/few-shot settings and OpenLLaMA in instruction-based fine-tuning on NASDAQ-100 data, showing that LLMs can surpass traditional baselines and that chain-of-thought prompting enhances performance. The study demonstrates the feasibility of fine-tuning publicly available LLMs to provide coherent forecasts and human-readable explanations, albeit with some gaps relative to GPT-4. Overall, the results suggest a promising direction for interpretable, cross-modal financial forecasting using LLMs, with future work extending to more indices, richer data types, and larger models.

Abstract

This paper presents a novel study on harnessing Large Language Models' (LLMs) outstanding knowledge and reasoning abilities for explainable financial time series forecasting. The application of machine learning models to financial time series comes with several challenges, including the difficulty in cross-sequence reasoning and inference, the hurdle of incorporating multi-modal signals from historical news, financial knowledge graphs, etc., and the issue of interpreting and explaining the model results. In this paper, we focus on NASDAQ-100 stocks, making use of publicly accessible historical stock price data, company metadata, and historical economic/financial news. We conduct experiments to illustrate the potential of LLMs in offering a unified solution to the aforementioned challenges. Our experiments include trying zero-shot/few-shot inference with GPT-4 and instruction-based fine-tuning with a public LLM model Open LLaMA. We demonstrate our approach outperforms a few baselines, including the widely applied classic ARMA-GARCH model and a gradient-boosting tree model. Through the performance comparison results and a few examples, we find LLMs can make a well-thought decision by reasoning over information from both textual news and price time series and extracting insights, leveraging cross-sequence information, and utilizing the inherent knowledge embedded within the LLM. Additionally, we show that a publicly available LLM such as Open-LLaMA, after fine-tuning, can comprehend the instruction to generate explainable forecasts and achieve reasonable performance, albeit relatively inferior in comparison to GPT-4.

Temporal Data Meets LLM -- Explainable Financial Time Series Forecasting

TL;DR

This work investigates using large language models to achieve explainable financial time series forecasting by unifying cross-sequence reasoning and multi-modal signals from stock prices, company profiles, and news. It evaluates GPT-4 in zero-shot/few-shot settings and OpenLLaMA in instruction-based fine-tuning on NASDAQ-100 data, showing that LLMs can surpass traditional baselines and that chain-of-thought prompting enhances performance. The study demonstrates the feasibility of fine-tuning publicly available LLMs to provide coherent forecasts and human-readable explanations, albeit with some gaps relative to GPT-4. Overall, the results suggest a promising direction for interpretable, cross-modal financial forecasting using LLMs, with future work extending to more indices, richer data types, and larger models.

Abstract

This paper presents a novel study on harnessing Large Language Models' (LLMs) outstanding knowledge and reasoning abilities for explainable financial time series forecasting. The application of machine learning models to financial time series comes with several challenges, including the difficulty in cross-sequence reasoning and inference, the hurdle of incorporating multi-modal signals from historical news, financial knowledge graphs, etc., and the issue of interpreting and explaining the model results. In this paper, we focus on NASDAQ-100 stocks, making use of publicly accessible historical stock price data, company metadata, and historical economic/financial news. We conduct experiments to illustrate the potential of LLMs in offering a unified solution to the aforementioned challenges. Our experiments include trying zero-shot/few-shot inference with GPT-4 and instruction-based fine-tuning with a public LLM model Open LLaMA. We demonstrate our approach outperforms a few baselines, including the widely applied classic ARMA-GARCH model and a gradient-boosting tree model. Through the performance comparison results and a few examples, we find LLMs can make a well-thought decision by reasoning over information from both textual news and price time series and extracting insights, leveraging cross-sequence information, and utilizing the inherent knowledge embedded within the LLM. Additionally, we show that a publicly available LLM such as Open-LLaMA, after fine-tuning, can comprehend the instruction to generate explainable forecasts and achieve reasonable performance, albeit relatively inferior in comparison to GPT-4.
Paper Structure (21 sections, 6 figures, 2 tables)

This paper contains 21 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: An example of a stock's company profile consisting of the company description, the general positive/negative factors impacting the company's stock prices.
  • Figure 2: An example of news summary and keywords extracted from one news article for a stock's company (AAPL in this case), generated by GPT-4. The original news is at https://sports.yahoo.com/apple-joins-cost-cut-bandwagon-145845685.html. The prompt in this example is a template. Texts inside [] are comments and are not really in the prompt when we submit it to the LLM, and we will fill stock symbol and the news into the placeholders enclosed by {}.
  • Figure 3: An example of one week's meta summary and keywords condensed from all the company's summaries and keywords from the week.
  • Figure 4: The prompt structure for the experiments in this paper with LLMs. We also give an example of GPT-4 response to a concrete prompt constructed from information on and before 04/30/2023. We notice the cross-sequence information and macro-economy information are obviously considered in the LLM's reasoning. The stock return forecast U1 is for the next week from 05/01/2023 to 05/07/2023. The actual market performance for AAPL is U3. We note Apple 2023 Q2 earning call happened on May 04 beat expectation, which maybe the major contributor to the higher-than-forecasting gain in the week.
  • Figure 5: GPT-4 outputs its detailed reasoning steps if we simply add an instruction "Can you reason step by step before the finalized output?" to the end of the prompt in Figure \ref{['fig:inference_prompt']}. With detailed reasoning steps, GPT-4 captures a previously missed point "Wall Street anticipates a strong earnings report, boosting stock morale", and amends the stock return forecast as U2.
  • ...and 1 more figures