Table of Contents
Fetching ...

Bitcoin's Edge: Embedded Sentiment in Blockchain Transactional Data

Charalampos Kleitsikas, Nikolaos Korfiatis, Stefanos Leonardos, Carmine Ventre

TL;DR

The paper investigates whether sentiment encoded in arbitrarily embedded blockchain messages can forecast cryptocurrency price movements. It develops an NLP-driven pipeline combining topic modeling (LDA, BERTopic) and sentiment analysis (VADER, TextBlob, CryptoBERT) on on-chain text from Bitcoin and Ethereum, followed by ML classifiers with time-series cross-validation. Key findings show that blockchain sentiment provides predictive power for Bitcoin (up to $10.53 ext{%}$ accuracy improvement for $1$-day ahead predictions, and up to $60.53 ext{%}$ in one configuration) while Ethereum yields weaker gains, indicating an informational edge for Bitcoin. The work highlights on-chain content as a freely accessible, immutable data source for financial forecasting and delivers open data and code to support further development of blockchain sentiment analysis as a robust framework for crypto-market prediction.

Abstract

Cryptocurrency blockchains, beyond their primary role as distributed payment systems, are increasingly used to store and share arbitrary content, such as text messages and files. Although often non-financial, this hidden content can impact price movements by conveying private information, shaping sentiment, and influencing public opinion. However, current analyses of such data are limited in scope and scalability, primarily relying on manual classification or hand-crafted heuristics. In this work, we address these limitations by employing Natural Language Processing techniques to analyze, detect patterns, and extract public sentiment encoded within blockchain transactional data. Using a variety of Machine Learning techniques, we showcase for the first time the predictive power of blockchain-embedded sentiment in forecasting cryptocurrency price movements on the Bitcoin and Ethereum blockchains. Our findings shed light on a previously underexplored source of freely available, transparent, and immutable data and introduce blockchain sentiment analysis as a novel and robust framework for enhancing financial predictions in cryptocurrency markets. Incidentally, we discover an asymmetry between cryptocurrencies; Bitcoin has an informational advantage over Ethereum in that the sentiment embedded into transactional data is sufficient to predict its price movement.

Bitcoin's Edge: Embedded Sentiment in Blockchain Transactional Data

TL;DR

The paper investigates whether sentiment encoded in arbitrarily embedded blockchain messages can forecast cryptocurrency price movements. It develops an NLP-driven pipeline combining topic modeling (LDA, BERTopic) and sentiment analysis (VADER, TextBlob, CryptoBERT) on on-chain text from Bitcoin and Ethereum, followed by ML classifiers with time-series cross-validation. Key findings show that blockchain sentiment provides predictive power for Bitcoin (up to accuracy improvement for -day ahead predictions, and up to in one configuration) while Ethereum yields weaker gains, indicating an informational edge for Bitcoin. The work highlights on-chain content as a freely accessible, immutable data source for financial forecasting and delivers open data and code to support further development of blockchain sentiment analysis as a robust framework for crypto-market prediction.

Abstract

Cryptocurrency blockchains, beyond their primary role as distributed payment systems, are increasingly used to store and share arbitrary content, such as text messages and files. Although often non-financial, this hidden content can impact price movements by conveying private information, shaping sentiment, and influencing public opinion. However, current analyses of such data are limited in scope and scalability, primarily relying on manual classification or hand-crafted heuristics. In this work, we address these limitations by employing Natural Language Processing techniques to analyze, detect patterns, and extract public sentiment encoded within blockchain transactional data. Using a variety of Machine Learning techniques, we showcase for the first time the predictive power of blockchain-embedded sentiment in forecasting cryptocurrency price movements on the Bitcoin and Ethereum blockchains. Our findings shed light on a previously underexplored source of freely available, transparent, and immutable data and introduce blockchain sentiment analysis as a novel and robust framework for enhancing financial predictions in cryptocurrency markets. Incidentally, we discover an asymmetry between cryptocurrencies; Bitcoin has an informational advantage over Ethereum in that the sentiment embedded into transactional data is sufficient to predict its price movement.

Paper Structure

This paper contains 14 sections, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Our proposed NLP pipeline.
  • Figure 2: Pearson Correlations.
  • Figure 3: Comparison of Bitcoin corpuses topics' clustering.