Table of Contents
Fetching ...

Point process analysis of geographical diffusion of news in Argentina

Lucio L. Garcia, Giulio Tirabassi, Cristina Masoller, Pablo Balenzuela

Abstract

The diffusion of information plays a crucial role in a society, affecting its economy and the well-being of the population. Characterizing the diffusion process is challenging because it is highly non-stationary and varies with the media type. To understand the spreading of newspaper news in Argentina, we collected data from more than 27000 articles published in six main provinces during four months. We classified the articles into 20 thematic axes and obtained a set of time series that capture daily newspaper attention on different topics in different provinces. To analyze the data we use a point process approach. For each topic, $n$, and for all pairs of provinces, $i$ and $j$, we use two measures to quantify the synchronicity of the events, $Q_s(i,j)$, which quantifies the number of events that occur almost simultaneously in $i$ and $j$, and $Q_a(i,j)$, which quantifies the direction of news spreading. Our analysis unveils how fast the information diffusion process is, showing pairs of provinces with very similar and almost simultaneous temporal variations of media attention. On the other hand, we also calculate other measures computed from the raw time series, such as Granger Causality and Transfer Entropy, which do not perform well in this context because they often return opposite directions of information transfer. We interpret this as due to different factors such as the characteristics of the data, which is highly non-stationary and the features of the information diffusion process, which is very fast and probably acts at a sub-resolution time scale.

Point process analysis of geographical diffusion of news in Argentina

Abstract

The diffusion of information plays a crucial role in a society, affecting its economy and the well-being of the population. Characterizing the diffusion process is challenging because it is highly non-stationary and varies with the media type. To understand the spreading of newspaper news in Argentina, we collected data from more than 27000 articles published in six main provinces during four months. We classified the articles into 20 thematic axes and obtained a set of time series that capture daily newspaper attention on different topics in different provinces. To analyze the data we use a point process approach. For each topic, , and for all pairs of provinces, and , we use two measures to quantify the synchronicity of the events, , which quantifies the number of events that occur almost simultaneously in and , and , which quantifies the direction of news spreading. Our analysis unveils how fast the information diffusion process is, showing pairs of provinces with very similar and almost simultaneous temporal variations of media attention. On the other hand, we also calculate other measures computed from the raw time series, such as Granger Causality and Transfer Entropy, which do not perform well in this context because they often return opposite directions of information transfer. We interpret this as due to different factors such as the characteristics of the data, which is highly non-stationary and the features of the information diffusion process, which is very fast and probably acts at a sub-resolution time scale.
Paper Structure (10 sections, 10 equations, 16 figures, 2 tables)

This paper contains 10 sections, 10 equations, 16 figures, 2 tables.

Figures (16)

  • Figure 1: Global agendas of the 20 topics. The notation $T_{i}$ denotes the i-th topic following the labels assigned in \ref{['tab:topics']}. Some topics have time series with pronounced events while other do not. The topics are numbered and displayed from more (top) to less (bottom) eventful.
  • Figure 2: Local agendas of two topics in the six provinces (shifted vertically for clarity). A Topic “Former Minister of Economy”; here we note that all time series exhibit a well-defined event that subsequently decays. B Topic “International affairs”; here we see a much noisier behavior, with the time series displaying numerous peaks.
  • Figure 3: A Histogram of all the local agenda values (for the six provinces) of topic “Fuel shortage” ($n=15$). The vertical lines indicate the activation and deactivation thresholds, $th_a$ (solid line) and $th_d$ (dashed line). B & C, Local agendas in Córdoba ($i=2$) and Santa Fe ($i=5$), respectively; the circles indicate the events detected.
  • Figure 4: Event coincidence detection and quantification.A Agenda time series of topic “Fuel Shortage” ($n=15$) in Córdoba ($i=2$, blue) and in Santa Fe ($j=5$ red); here colored circles indicate events in Córdoba (solid blue) that precede (by a maximum of $\tau = 3$ days) or that occur simultaneously to events in Santa Fe (solid red) --empty circles indicate all the other events. Counting the number of coincidences gives $C_{2,5}(15) = 1/2+1+1/2=2$. B Time series “Fuel Shortage” in Santa Fe (blue) and in Córdoba (red). The colored circles show coincidences such that events in Santa Fe (solid blue) precede or occur simultaneously to events in Córdoba (solid red). Counting the number of coincidences gives $C_{5,2}(15) = 1+1+1+1/2+1/2=4$. Using Eq. \ref{['eq:syn_def']} with $m=20$ (total number of events) gives ${Q^{15}_s}_{5,2}=0.6$ and ${Q^{15}_a}_{5,2}=0.2$.
  • Figure 5: Differentiation of “eventual” and “general” topics. Average number of events per province of each topic. The color gradient indicates which topics are more “Eventual” (dark blue) and which ones are more “General” (bright yellow).
  • ...and 11 more figures