Behavioral Homophily in Social Media via Inverse Reinforcement Learning: A Reddit Case Study
Lanqin Yuan, Philipp J. Schneider, Marian-Andrei Rizoiu
TL;DR
The paper addresses measuring social homophily on platforms lacking explicit social networks by introducing an inverse reinforcement learning framework to infer user policies and a symmetric weighted KL divergence SWKL to quantify behavioral similarity. Applied to Reddit across 15 home subreddits and 662 users over 2015–2022, the approach reveals that behavioral homophily largely tracks topical homophily but uncovers distinct behavioral personas such as Disagreers and cross-topic convergence patterns. It also provides a disagreement classifier and a topic-baseline for comparative analysis, demonstrating the added value of behavior-centric analysis in hierarchical, anonymous online communities. The work highlights the practical significance of modeling user behavior to understand engagement, polarization, and discourse dynamics, while acknowledging data access and scalability limitations that affect universal applicability.
Abstract
Online communities play a critical role in shaping societal discourse and influencing collective behavior in the real world. The tendency for people to connect with others who share similar characteristics and views, known as homophily, plays a key role in the formation of echo chambers which further amplify polarization and division. Existing works examining homophily in online communities traditionally infer it using content- or adjacency-based approaches, such as constructing explicit interaction networks or performing topic analysis. These methods fall short for platforms where interaction networks cannot be easily constructed and fail to capture the complex nature of user interactions across the platform. This work introduces a novel approach for quantifying user homophily. We first use an Inverse Reinforcement Learning (IRL) framework to infer users' policies, then use these policies as a measure of behavioral homophily. We apply our method to Reddit, conducting a case study across 5.9 million interactions over six years, demonstrating how this approach uncovers distinct behavioral patterns and user roles that vary across different communities. We further validate our behavioral homophily measure against traditional content-based homophily, offering a powerful method for analyzing social media dynamics and their broader societal implications. We find, among others, that users can behave very similarly (high behavioral homophily) when discussing entirely different topics like soccer vs e-sports (low topical homophily), and that there is an entire class of users on Reddit whose purpose seems to be to disagree with others.
