Detection of financial opportunities in micro-blogging data with a stacked classification system
Francisco de Arriba-Pérez, Silvia García-Méndez, José A. Regueiro-Janeiro, Francisco J. González-Castaño
TL;DR
This work tackles the problem of detecting financially meaningful opportunities in Twitter data by framing opportunities as forward-looking, anticipatory financial emotions. It introduces a three-layer stacked classifier that progressively filters neutral content, separates general positive from negative emotions, and finally isolates opportunity tweets using a rich feature set that includes n-grams, polarity/emotion lexicons, and temporal cues. Evaluation on a manually annotated dataset of about 4,959 tweets demonstrates high precision for opportunities, with a final RF-based model achieving around 82–83% precision and attractively high tolerance metrics (tau1 around 90% and tau2 around 95%). The approach supports decision-making for investors and dashboards, and the authors propose extensions to domain-specific filters and multilingual coverage to broaden practical impact.
Abstract
Micro-blogging sources such as the Twitter social network provide valuable real-time data for market prediction models. Investors' opinions in this network follow the fluctuations of the stock markets and often include educated speculations on market opportunities that may have impact on the actions of other investors. In view of this, we propose a novel system to detect positive predictions in tweets, a type of financial emotions which we term "opportunities" that are akin to "anticipation" in Plutchik's theory. Specifically, we seek a high detection precision to present a financial operator a substantial amount of such tweets while differentiating them from the rest of financial emotions in our system. We achieve it with a three-layer stacked Machine Learning classification system with sophisticated features that result from applying Natural Language Processing techniques to extract valuable linguistic information. Experimental results on a dataset that has been manually annotated with financial emotion and ticker occurrence tags demonstrate that our system yields satisfactory and competitive performance in financial opportunity detection, with precision values up to 83%. This promising outcome endorses the usability of our system to support investors' decision making.
