Table of Contents
Fetching ...

A Bayesian framework for measuring association and its application to emotional dynamics in Web discourse

Henrique S. Xavier, Diogo Cortiz, Mateus Silvestrin, Ana Luísa Freitas, Letícia Yumi Nakao Morello, Fernanda Naomi Pantaleão, Gabriel Gaudencio do Rêgo

TL;DR

The paper tackles the challenge of measuring association between categorical variables in web-scale data while providing uncertainty quantification. It introduces a Bayesian multinomial-Dirichlet framework that yields a posterior for the added-value measure $\Delta P(A,B)$ via MCMC, unifying association detection with strength estimation. The BRASS implementation demonstrates robust inference on simulated data and reveals meaningful emotion co-occurrences, oppositions, and hierarchical relations in 4,613 Portuguese tweets across 30 emotion categories. The approach offers a principled tool for sentiment and discourse analysis on social media, trading off speed for rigorous probabilistic interpretation and significance assessment.

Abstract

This paper introduces a Bayesian framework designed to measure the degree of association between categorical random variables. The method is grounded in the formal definition of variable independence and is implemented using Markov Chain Monte Carlo (MCMC) techniques. Unlike commonly employed techniques in Association Rule Learning, this approach enables a clear and precise estimation of confidence intervals and the statistical significance of the measured degree of association. We applied the method to non-exclusive emotions identified by annotators in 4,613 tweets written in Portuguese. This analysis revealed pairs of emotions that exhibit associations and mutually opposed pairs. Moreover, the method identifies hierarchical relations between categories, a feature observed in our data, and is utilized to cluster emotions into basic-level groups.

A Bayesian framework for measuring association and its application to emotional dynamics in Web discourse

TL;DR

The paper tackles the challenge of measuring association between categorical variables in web-scale data while providing uncertainty quantification. It introduces a Bayesian multinomial-Dirichlet framework that yields a posterior for the added-value measure via MCMC, unifying association detection with strength estimation. The BRASS implementation demonstrates robust inference on simulated data and reveals meaningful emotion co-occurrences, oppositions, and hierarchical relations in 4,613 Portuguese tweets across 30 emotion categories. The approach offers a principled tool for sentiment and discourse analysis on social media, trading off speed for rigorous probabilistic interpretation and significance assessment.

Abstract

This paper introduces a Bayesian framework designed to measure the degree of association between categorical random variables. The method is grounded in the formal definition of variable independence and is implemented using Markov Chain Monte Carlo (MCMC) techniques. Unlike commonly employed techniques in Association Rule Learning, this approach enables a clear and precise estimation of confidence intervals and the statistical significance of the measured degree of association. We applied the method to non-exclusive emotions identified by annotators in 4,613 tweets written in Portuguese. This analysis revealed pairs of emotions that exhibit associations and mutually opposed pairs. Moreover, the method identifies hierarchical relations between categories, a feature observed in our data, and is utilized to cluster emotions into basic-level groups.
Paper Structure (14 sections, 11 equations, 4 figures, 2 tables)

This paper contains 14 sections, 11 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Histograms depicting the sampled $\Delta P(A, B)$ posterior distributions, showcasing results from eight simulations characterized by varying parameters of $\Delta P(A, B)$, $P(A=1)$, $P(B=1)$, and sample size $N$.
  • Figure 2: Probability boosts on sentiment $A$ detection given that another sentiment $B$ was already detected (added value). Each bar starts at $P(A=1)$ and ends at $P(A=1|B=1)$. An error bar representing the standard deviation of $P(A=1|B=1)$ features at the end of each bar. The length of each bar represents $\Delta P(A, B)$, and the bar is blue when $\Delta P(A, B) > 0$, and red otherwise. We only included emotion pairs with statistically significant dependence and the $A\leftrightarrow B$ permutation in each emotion pair with the highest $|\Delta P|$.
  • Figure 3: Venn diagram depicting the number of tweets annotated with grief and/or sadness.
  • Figure 4: Graph of the relationships between emotions, represented as red nodes. Statistically significant positive probability boosts (added values) are represented by gray connections (graph edges). The width of the connections are proportional to the highest boost in each pair. The nodes' positions were set with the tSNE technique.