A Bayesian framework for measuring association and its application to emotional dynamics in Web discourse
Henrique S. Xavier, Diogo Cortiz, Mateus Silvestrin, Ana Luísa Freitas, Letícia Yumi Nakao Morello, Fernanda Naomi Pantaleão, Gabriel Gaudencio do Rêgo
TL;DR
The paper tackles the challenge of measuring association between categorical variables in web-scale data while providing uncertainty quantification. It introduces a Bayesian multinomial-Dirichlet framework that yields a posterior for the added-value measure $\Delta P(A,B)$ via MCMC, unifying association detection with strength estimation. The BRASS implementation demonstrates robust inference on simulated data and reveals meaningful emotion co-occurrences, oppositions, and hierarchical relations in 4,613 Portuguese tweets across 30 emotion categories. The approach offers a principled tool for sentiment and discourse analysis on social media, trading off speed for rigorous probabilistic interpretation and significance assessment.
Abstract
This paper introduces a Bayesian framework designed to measure the degree of association between categorical random variables. The method is grounded in the formal definition of variable independence and is implemented using Markov Chain Monte Carlo (MCMC) techniques. Unlike commonly employed techniques in Association Rule Learning, this approach enables a clear and precise estimation of confidence intervals and the statistical significance of the measured degree of association. We applied the method to non-exclusive emotions identified by annotators in 4,613 tweets written in Portuguese. This analysis revealed pairs of emotions that exhibit associations and mutually opposed pairs. Moreover, the method identifies hierarchical relations between categories, a feature observed in our data, and is utilized to cluster emotions into basic-level groups.
