Impact of News on the Commodity Market: Dataset and Results
Ankur Sinha, Tanmay Khandait
TL;DR
Addresses whether gold-related news headlines influence commodity prices beyond sentiment. It introduces a multi-dimensional annotation framework and releases a public dataset of 11,412 headlines (2000–2019) across nine categories. It systematically compares text representations and classifiers, finding that a financial-domain BERT model yields the best performance while a TF-IDF+SVM baseline remains competitive. It then conducts a causality analysis using a directionality score $S = (N_{Price Up} - N_{Price Down})/(N_{Price Up} + N_{Price Constant} + N_{Price Down})$ and a linear model $P_{N,N-1} = α + β S_{N-1,N-2} + ε$, reporting significant $β$ with $p$-values $0.0318$ and $0.00218$, implying news content helps predict future gold prices, with effects observable up to 24 hours later.
Abstract
Over the last few years, machine learning based methods have been applied to extract information from news flow in the financial domain. However, this information has mostly been in the form of the financial sentiments contained in the news headlines, primarily for the stock prices. In our current work, we propose that various other dimensions of information can be extracted from news headlines, which will be of interest to investors, policy-makers and other practitioners. We propose a framework that extracts information such as past movements and expected directionality in prices, asset comparison and other general information that the news is referring to. We apply this framework to the commodity "Gold" and train the machine learning models using a dataset of 11,412 human-annotated news headlines (released with this study), collected from the period 2000-2019. We experiment to validate the causal effect of news flow on gold prices and observe that the information produced from our framework significantly impacts the future gold price.
