Table of Contents
Fetching ...

Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum Price Forecasting

Alvaro Paredes Amorin, Andre Python, Christoph Weisser

TL;DR

This study generates monthly sentiment scores from English and Chinese news headlines and integrates them with traditional tabular data, including base metal indices, exchange rates, inflation rates, and energy prices, to demonstrate the predictive performance and economic utility of lightweight, finetuned large language models in aluminum price forecasting.

Abstract

By capturing the prevailing sentiment and market mood, textual data has become increasingly vital for forecasting commodity prices, particularly in metal markets. However, the effectiveness of lightweight, finetuned large language models (LLMs) in extracting predictive signals for aluminum prices, and the specific market conditions under which these signals are most informative, remains under-explored. This study generates monthly sentiment scores from English and Chinese news headlines (Reuters, Dow Jones Newswires, and China News Service) and integrates them with traditional tabular data, including base metal indices, exchange rates, inflation rates, and energy prices. We evaluate the predictive performance and economic utility of these models through long-short simulations on the Shanghai Metal Exchange from 2007 to 2024. Our results demonstrate that during periods of high volatility, Long Short-Term Memory (LSTM) models incorporating sentiment data from a finetuned Qwen3 model (Sharpe ratio 1.04) significantly outperform baseline models using tabular data alone (Sharpe ratio 0.23). Subsequent analysis elucidates the nuanced roles of news sources, topics, and event types in aluminum price forecasting.

Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum Price Forecasting

TL;DR

This study generates monthly sentiment scores from English and Chinese news headlines and integrates them with traditional tabular data, including base metal indices, exchange rates, inflation rates, and energy prices, to demonstrate the predictive performance and economic utility of lightweight, finetuned large language models in aluminum price forecasting.

Abstract

By capturing the prevailing sentiment and market mood, textual data has become increasingly vital for forecasting commodity prices, particularly in metal markets. However, the effectiveness of lightweight, finetuned large language models (LLMs) in extracting predictive signals for aluminum prices, and the specific market conditions under which these signals are most informative, remains under-explored. This study generates monthly sentiment scores from English and Chinese news headlines (Reuters, Dow Jones Newswires, and China News Service) and integrates them with traditional tabular data, including base metal indices, exchange rates, inflation rates, and energy prices. We evaluate the predictive performance and economic utility of these models through long-short simulations on the Shanghai Metal Exchange from 2007 to 2024. Our results demonstrate that during periods of high volatility, Long Short-Term Memory (LSTM) models incorporating sentiment data from a finetuned Qwen3 model (Sharpe ratio 1.04) significantly outperform baseline models using tabular data alone (Sharpe ratio 0.23). Subsequent analysis elucidates the nuanced roles of news sources, topics, and event types in aluminum price forecasting.
Paper Structure (28 sections, 5 equations, 13 figures, 6 tables)

This paper contains 28 sections, 5 equations, 13 figures, 6 tables.

Figures (13)

  • Figure 1: Workflow.Data 1: financial data from WIND terminal includes tabular data extracted from March 2007 to April 2024 (4,152 rows). Data 2: textual data that includes headlines from two news sources in English (Reuters (N=4,963), Dow Jones Newswires (N=11,581), and a news source in Chinese (China News Service (N=8,970)) collected from March 2007 to April 2024. The data processing and sentiment analysis (green) includes data scaling, normalization, and the treatment of missing values for the tabular data, and the use of language models to generate new sentiment variables from Data 2. The sentiment is classified in "positive", "negative" or "neutral". Monthly sentiment scores are combined with other numerical data (yellow box) to train and test time series models in order to predict monthly aluminum prices.
  • Figure 2: Evaluation of portfolio's performance by strategy and volatility scenarios. For each strategy (tabular-only, tabular+sentiment, sentiment only (qwen), sentiment only (reuters)) the portfolio's performance is represented by the Sharpe ratio (y-axis) across three volatility scenarios (panels), with: A high volatility scenario (n=28 months), B medium volatility (n=106 months), and C low volatility (n=66 months). Error bars represent ±1 standard error. The highlighted bars (orange) indicate the best performing strategy within each scenario.
  • Figure 3: Evaluation of topic and event type in predictive aluminum prices.A Comparison of Sharpe ratio across the top five most covered topics (Price Movement, Company News, Production Output, Inventory Stocks, Supply Disruption) and the best topic combination, all calculated using Reuters headlines with the finetuned Qwen sentiment. The benchmark (gray) represents all headlines aggregated. Here $n$ shows the number of months in which each topic is present. B Comparison of forward-looking versus past event news types. C Comparison of the top-performing topic (Price Movement) by event type. Error bars represent ±1 standard error. The green bar indicates the best-performing topic combination, while the orange bars indicate the best performing individual topic or event type.
  • Figure 4: Predictive performance by topic coverage. A Percentage of headlines allocated to each topic by source. B Global Sharpe ratio by topic, with gold rings highlighting the top three performers. Topics are sorted by global Sharpe ratio (descending).
  • Figure 5: Cluster plot of the Reuters and Dow Jones Newswires aluminum datasets before (left) and after (right) filtering using LLaMA3. The y and x axis are the 2 dimensions obtained by reducing the embeddings dimensions. The plots on the left show less number of outliers outside the main groups clusters.
  • ...and 8 more figures