Table of Contents
Fetching ...

Optimal Text-Based Time-Series Indices

David Ardia, Keven Bluteau

TL;DR

The paper addresses building objective text-based time-series indices from large corpora by introducing a selection-matrix framework that maps tokens to index contributions. It formalizes a selection matrix $\Omega$ of size $V \times K$ and optimizes it with a domain-aware genetic algorithm, guided by word embeddings and pruning to avoid overfitting. The approach recovers established token dictionaries (e.g., BakerEtAl2016's EPU) and yields text-based indices that outperform benchmarks for the VIX and inflation expectations on out-of-sample periods using a WSJ corpus of $763{,}542$ articles. The work provides a practical, scalable method to construct objective, predictive text indices for macrofinancial targets and can be extended to other target variables and corpora.

Abstract

We propose an approach to construct text-based time-series indices in an optimal way--typically, indices that maximize the contemporaneous relation or the predictive performance with respect to a target variable, such as inflation. We illustrate our methodology with a corpus of news articles from the Wall Street Journal by optimizing text-based indices focusing on tracking the VIX index and inflation expectations. Our results highlight the superior performance of our approach compared to existing indices.

Optimal Text-Based Time-Series Indices

TL;DR

The paper addresses building objective text-based time-series indices from large corpora by introducing a selection-matrix framework that maps tokens to index contributions. It formalizes a selection matrix of size and optimizes it with a domain-aware genetic algorithm, guided by word embeddings and pruning to avoid overfitting. The approach recovers established token dictionaries (e.g., BakerEtAl2016's EPU) and yields text-based indices that outperform benchmarks for the VIX and inflation expectations on out-of-sample periods using a WSJ corpus of articles. The work provides a practical, scalable method to construct objective, predictive text indices for macrofinancial targets and can be extended to other target variables and corpora.

Abstract

We propose an approach to construct text-based time-series indices in an optimal way--typically, indices that maximize the contemporaneous relation or the predictive performance with respect to a target variable, such as inflation. We illustrate our methodology with a corpus of news articles from the Wall Street Journal by optimizing text-based indices focusing on tracking the VIX index and inflation expectations. Our results highlight the superior performance of our approach compared to existing indices.
Paper Structure (2 sections, 2 equations)

This paper contains 2 sections, 2 equations.