Table of Contents
Fetching ...

Impact of News on the Commodity Market: Dataset and Results

Ankur Sinha, Tanmay Khandait

TL;DR

Addresses whether gold-related news headlines influence commodity prices beyond sentiment. It introduces a multi-dimensional annotation framework and releases a public dataset of 11,412 headlines (2000–2019) across nine categories. It systematically compares text representations and classifiers, finding that a financial-domain BERT model yields the best performance while a TF-IDF+SVM baseline remains competitive. It then conducts a causality analysis using a directionality score $S = (N_{Price Up} - N_{Price Down})/(N_{Price Up} + N_{Price Constant} + N_{Price Down})$ and a linear model $P_{N,N-1} = α + β S_{N-1,N-2} + ε$, reporting significant $β$ with $p$-values $0.0318$ and $0.00218$, implying news content helps predict future gold prices, with effects observable up to 24 hours later.

Abstract

Over the last few years, machine learning based methods have been applied to extract information from news flow in the financial domain. However, this information has mostly been in the form of the financial sentiments contained in the news headlines, primarily for the stock prices. In our current work, we propose that various other dimensions of information can be extracted from news headlines, which will be of interest to investors, policy-makers and other practitioners. We propose a framework that extracts information such as past movements and expected directionality in prices, asset comparison and other general information that the news is referring to. We apply this framework to the commodity "Gold" and train the machine learning models using a dataset of 11,412 human-annotated news headlines (released with this study), collected from the period 2000-2019. We experiment to validate the causal effect of news flow on gold prices and observe that the information produced from our framework significantly impacts the future gold price.

Impact of News on the Commodity Market: Dataset and Results

TL;DR

Addresses whether gold-related news headlines influence commodity prices beyond sentiment. It introduces a multi-dimensional annotation framework and releases a public dataset of 11,412 headlines (2000–2019) across nine categories. It systematically compares text representations and classifiers, finding that a financial-domain BERT model yields the best performance while a TF-IDF+SVM baseline remains competitive. It then conducts a causality analysis using a directionality score and a linear model , reporting significant with -values and , implying news content helps predict future gold prices, with effects observable up to 24 hours later.

Abstract

Over the last few years, machine learning based methods have been applied to extract information from news flow in the financial domain. However, this information has mostly been in the form of the financial sentiments contained in the news headlines, primarily for the stock prices. In our current work, we propose that various other dimensions of information can be extracted from news headlines, which will be of interest to investors, policy-makers and other practitioners. We propose a framework that extracts information such as past movements and expected directionality in prices, asset comparison and other general information that the news is referring to. We apply this framework to the commodity "Gold" and train the machine learning models using a dataset of 11,412 human-annotated news headlines (released with this study), collected from the period 2000-2019. We experiment to validate the causal effect of news flow on gold prices and observe that the information produced from our framework significantly impacts the future gold price.

Paper Structure

This paper contains 12 sections, 4 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Overview of System Design
  • Figure 2: Preparation of Input