Table of Contents
Fetching ...

Sub-exponential Growth of New Words and Names Online: A Piecewise Power-Law Model

Hayafumi Watanabe

TL;DR

This paper introduces a piecewise power-law model to describe complex growth curves of online word diffusion, showing sub-exponential growth is a prevalent pattern across large-scale blog data. By nondimensionalizing and applying a Box–Cox-like transformation, the authors reveal that a single shape parameter α_i controls diffusion form, while a piecewise extension captures multi-stage dynamics and jumps. They connect α_i to topic inwardness through a simple infection-like diffusion model, yielding α_i = 1 − γ_i/Q, thereby offering a sociophysical interpretation of diffusion shapes and linking micro-behavior to macro patterns. The results provide a practical framework for describing, comparing, and interpreting diverse diffusion curves, with implications for understanding how topic appeal and network structure shape information spread online.

Abstract

The diffusion of ideas and language in society has conventionally been described by S-shaped models, such as the logistic curve. However, the role of sub-exponential growth -- a slower-than-exponential pattern known in epidemiology -- has been largely overlooked in broader social phenomena. Here, we present a piecewise power-law model to characterize complex growth curves with a few parameters. We systematically analyzed a large-scale dataset of approximately one billion Japanese blog articles linked to Wikipedia vocabulary, and observed consistent patterns in web search trend data (English, Spanish, and Japanese). Our analysis of 2,963 items, selected for reliable estimation (e.g., sufficient duration/peak, monotonic growth), reveals that 1,625 (55%) diffusion patterns without abrupt level shifts were adequately described by one or two segments. For single-segment curves, we found that (i) the mode of the shape parameter $α$ was near 0.5, indicating prevalent sub-exponential growth; (ii) the peak diffusion scale is primarily determined by the growth rate $R$, with minor contributions from $α$ or the duration $T$; and (iii) $α$ showed a tendency to vary with the nature of the topic, being smaller for niche/local topics and larger for widely shared ones. Furthermore, a micro-behavioral model of outward (stranger) vs. inward (community) contact suggests that $α$ can be interpreted as an index of the preference for outward-oriented communication. These findings suggest that sub-exponential growth is a common pattern of social diffusion, and our model provides a practical framework for consistently describing, comparing, and interpreting complex and diverse growth curves.

Sub-exponential Growth of New Words and Names Online: A Piecewise Power-Law Model

TL;DR

This paper introduces a piecewise power-law model to describe complex growth curves of online word diffusion, showing sub-exponential growth is a prevalent pattern across large-scale blog data. By nondimensionalizing and applying a Box–Cox-like transformation, the authors reveal that a single shape parameter α_i controls diffusion form, while a piecewise extension captures multi-stage dynamics and jumps. They connect α_i to topic inwardness through a simple infection-like diffusion model, yielding α_i = 1 − γ_i/Q, thereby offering a sociophysical interpretation of diffusion shapes and linking micro-behavior to macro patterns. The results provide a practical framework for describing, comparing, and interpreting diverse diffusion curves, with implications for understanding how topic appeal and network structure shape information spread online.

Abstract

The diffusion of ideas and language in society has conventionally been described by S-shaped models, such as the logistic curve. However, the role of sub-exponential growth -- a slower-than-exponential pattern known in epidemiology -- has been largely overlooked in broader social phenomena. Here, we present a piecewise power-law model to characterize complex growth curves with a few parameters. We systematically analyzed a large-scale dataset of approximately one billion Japanese blog articles linked to Wikipedia vocabulary, and observed consistent patterns in web search trend data (English, Spanish, and Japanese). Our analysis of 2,963 items, selected for reliable estimation (e.g., sufficient duration/peak, monotonic growth), reveals that 1,625 (55%) diffusion patterns without abrupt level shifts were adequately described by one or two segments. For single-segment curves, we found that (i) the mode of the shape parameter was near 0.5, indicating prevalent sub-exponential growth; (ii) the peak diffusion scale is primarily determined by the growth rate , with minor contributions from or the duration ; and (iii) showed a tendency to vary with the nature of the topic, being smaller for niche/local topics and larger for widely shared ones. Furthermore, a micro-behavioral model of outward (stranger) vs. inward (community) contact suggests that can be interpreted as an index of the preference for outward-oriented communication. These findings suggest that sub-exponential growth is a common pattern of social diffusion, and our model provides a practical framework for consistently describing, comparing, and interpreting complex and diverse growth curves.

Paper Structure

This paper contains 145 sections, 8 equations, 21 figures, 11 tables.

Figures (21)

  • Figure 1: Examples of keyword time series (normalized by the total number of articles; one step = 30 days). Black triangles denote empirical data; the red dashed line is the piecewise power-law model; the green dotted line is the single power-law model (Eq. \ref{['eq_base0']}). Keywords are English translations; the original Japanese keywords are given in Appendix \ref{['app_sec_fig_word']}. (a) "Low-cost SIM": $\alpha_i^{(1)}=0.77,\ R_i^{(1)}=0.22,\ T_i^{(1)}=53$. Adequately captured by the single power-law model (Section \ref{['sec_base']}). (b) "Smartphone": $\alpha_i^{(1)}=1.03$, $R_i^{(1)}=0.30$, $\alpha_i^{(2)}=-0.72$, $R_i^{(2)}=4411$, $T_i=131$. The single model (green dotted) is insufficient, but the continuous piecewise power-law model (Section \ref{['sec_kubun']}) fits well. The changepoint is $t=29$ (late August 2009). (c) " 2B55" (red circle emoji): $\alpha_i^{(1)}=0.066$, $R_i^{(1)}=0.59$, $\alpha_i^{(2)}=1.77$, $R_i^{(2)}=6.28\times 10^{-4}$. A typical case with a discontinuous jump at $t=74$ (September 2017) (see Section \ref{['sec_jump']} for the model with jumps). The vertical gray line marks the jump time. A potential contributing factor is improvements in emoji input tied to smartphone OS updates. (d)-(f) are the corresponding semi-log plots.
  • Figure 2: Time series captured by a single power-law model: scaled empirical data and corresponding simulations. (a)--(c) Scaled count series. Points show empirical data $s_i(t)$ (Section \ref{['sec_base']}); the red solid line is the scaled single power-law model (Eq. \ref{['eq_scale']}); many words collapse onto a common curve. Left: near-linear ($\alpha_i\approx 0$); middle: typical ($\alpha_i\approx 0.5$); right: exponential-like ($\alpha_i\approx 1$). In each panel, five items are shown in the order black triangle, red cross, green cross, blue square, and light-blue circle, labeled as word ($\alpha_i,\,R_i$; brief note). Keywords are English translations; the original Japanese keywords are given in Appendix \ref{['app_sec_fig_word']}. (a)$\alpha_i\approx 0$: "Erika Ikuta" ($0.00,0.59$; Japanese idol name), "NicoNico Seiga" ($0.09,0.12$; illustration sharing service), "Chuo Ward, Sagamihara City" ($-0.02,0.23$; new place name), "Labor pain taxi" ($-0.08,0.072$; maternity taxi service), "beLEGEND" ($0.01,0.097$; protein supplement brand). (b) $\alpha_i\approx 0.5$: "Tablet device" ($0.47,0.34$), "Crowdfunding" ($0.53,0.22$), "BABYMETAL" ($0.55,0.23$; metal idol group), "Rescue cat cafe" ($0.50,0.14$), " 2693" ($0.45,0.12$; anchor emoji). (c)$\alpha_i\approx 1$: "Shale gas" ($1.03,0.15$), "Acai bowl" ($0.98,0.10$), "Fumika Baba" ($0.91,0.11$; actress), " 26FA" ($1.01,0.076$; outdoors-related emoji), "Net-juu" ($0.93,0.15$; slang: fulfilled online life). (d)--(f) Corresponding log plots.(g)--(i) Simulations of the infection model (Section \ref{['sec_infection']}). Black thin solid line: 128 sample paths ($Q=1$); red dotted line: theoretical approximation (Eq. \ref{['eq_infect_ans']}); Green thick solid line: the simulation path closest to the theoretical prediction. (g) $\gamma_i=1, J_i=1$; (h) $\gamma_i=0.5, J_i=0.020$; (i) $\gamma_i=0, J_i=9.2\times 10^{-4}$. (j)--(l) Corresponding diffusion-path networks (directed edges from a recruiter to their recruits; first 1000 nodes shown; internal links via "exchanges" are excluded). Colors indicate infection time: older nodes are blue and newer nodes are yellow, varying linearly with time $t$.
  • Figure 3: Linearized word counts $z_i(t)$ (Eq. \ref{['eq_boxcox']}). Points show empirical data; the pink dashed line is $z=\tau$. Items are noted as word ($\alpha_i, R_i$; brief gloss). Keywords are English translations; the original Japanese keywords are given in Appendix \ref{['app_sec_fig_word']}. Black triangle: "Minami Ward, Sagamihara City" ($0.17,0.39$; new place name). Red cross: "SoundCloud" ($0.180,0.15$; music sharing site). Green cross: "Instagrammer" ($0.44,2.38$; person popular on Instagram). Blue square: "Komyushō" ($0.64,0.42$; net slang: poor at communication). Light-blue circle: "MicroUSB" ($0.80,0.080$; electronic interface). Gray hollow circle: "Microplastics" ($1.1,0.0043$; small plastic debris).
  • Figure 4: Examples of growth curves with two segments ($N=2$). Parenthetical tuples list $(\alpha_i^{(1)}, R_i^{(1)};\ \alpha_i^{(2)}, R_i^{(2)};\ \text{brief gloss})$. The black triangles denote the data, the red dash-dotted line is the $N=2$ piecewise power-law model, and the green dash-dotted line is the single power-law model ($N=1$). Keywords are English translations; the original Japanese keywords are given in Appendix \ref{['app_sec_fig_word']}. (a) "Kenshi Yonezu" ($-0.077,0.18; -0.12,6.40$; singer). Changepoint $t=65$ (November 2016). An example that is nearly linear, with the slope changing at the boundary; the slope change is plausibly related to increased exposure following a record label transfer. (b) "Arafifu" ($0.77,0.36; 1.15,0.018$; slang: around age 50). Changepoint $t=14$ (April 2009). An example transitioning from sub-exponential to exponential growth; the shift likely reflects broader recognition after winning the 2008 "Buzzword of the Year" award. (c) "Facebook Messenger" ($0.83,0.067; -0.20,1.0$; messaging app). Changepoint $t=44$ (March 2015). The change is likely associated with major feature updates, such as adding video and enabling use without a Facebook account.
  • Figure 5: Statistics of parameters for single-segment words ($N=1$; Section \ref{['sec_stat_main']}). (a) Probability density of $\alpha_i$. The vertical dashed line marks the mode at $0.43$. (b) Cumulative distribution of $R_i$. The dashed guide follows $\propto R_i^{-1.1}$; the cumulative distribution of $R_i$ is close to a power law with exponent $1$ (Zipf's law). (c) Cumulative distribution of $T_i$, which is close to exponential; the dashed guide follows $\propto \exp(-x/30)$. (d) Correlation between $\alpha_i$ and $R_i$. No clear correlation is observed $(\tau=-0.017,\ p=0.47)$ (Kendall's $\tau$ and $p$-value for the null of zero correlation; same notation below). (e) Correlation between $\alpha_i$ and $T_i$. No clear correlation is observed $(\tau=-0.00,\ p=0.91)$. (f) Correlation between $R_i$ and $T_i$. A weak negative correlation is detected, approximately consistent with $R_i \propto 1/T_i$$(\tau=-0.11,\ p<10^{-16})$, indicating that faster growth tends to be sustained for shorter durations. (g) Correlation between $\alpha_i$ and $y_i(T)$. No correlation is detected $(\tau=0.034,\ p=0.13)$. (h) Correlation between $R_i$ and $y_i(T)$. A strong positive, near-proportional relationship is detected $(\tau=0.55,\ p=2.2\times 10^{-16})$, showing that the growth rate $R_i$ is closely related to the peak value. (i) Correlation between $T_i$ and $y_i(T)$. No correlation is detected $(\tau=0.017,\ p=0.47)$. Further discussion of the lack of correlation is provided in Section \ref{['sec_stat_main']}.
  • ...and 16 more figures