Characterizing Political Campaigning with Lexical Mutants on Indian Social Media

Shruti Phadke; Tanushree Mitra

Characterizing Political Campaigning with Lexical Mutants on Indian Social Media

Shruti Phadke, Tanushree Mitra

TL;DR

The paper investigates how political actors in India use lexical mutations to amplify messages across languages and social platforms. It introduces a multilingual embedding and clique-based network method to detect lexical-mutant amplification campaigns on Facebook and Twitter around the Farmers' protests and the CAA, uncovering over 3.8k campaigns and a substantial share of unique lexical variants. It then characterizes how these campaigns span BJP, INC, and AAP communities, across platforms, and reveals reactionary narratives that unfold in temporal order, underscoring cross-party dynamics and policy-evasion concerns. The findings illuminate the scale and structure of modern online political influence in India and offer a pathway toward real-time detection and governance measures to curb manipulation while preserving genuine political expression.

Abstract

Increasingly online platforms are becoming popular arenas of political amplification in India. With known instances of pre-organized coordinated operations, researchers are questioning the legitimacy of political expression and its consequences on the democratic processes in India. In this paper, we study an evolved form of political amplification by first identifying and then characterizing political campaigns with lexical mutations. By lexical mutation, we mean content that is reframed, paraphrased, or altered while preserving the same underlying message. Using multilingual embeddings and network analysis, we detect over 3.8K political campaigns with text mutations spanning multiple languages and social media platforms in India. By further assessing the political leanings of accounts repeatedly involved in such amplification campaigns, we contribute a broader understanding of how political amplification is used across various political parties in India. Moreover, our temporal analysis of the largest amplification campaigns suggests that political campaigning can evolve as temporally ordered arguments and counter-arguments between groups with competing political interests. Overall, our work contributes insights into how lexical mutations can be leveraged to bypass the platform manipulation policies and how such competing campaigning can provide an exaggerated sense of political divide on Indian social media.

Characterizing Political Campaigning with Lexical Mutants on Indian Social Media

TL;DR

Abstract

Paper Structure (37 sections, 7 figures, 2 tables)

This paper contains 37 sections, 7 figures, 2 tables.

Citation
Introduction
Background and Related Work
Online Political Influence in India
Computational Research on online political amplification
Data
Political events and keywords
Farmers' protests:
Citizenship Amendment Act (CAA):
Selecting keywords for data collection:
Facebook dataset
Twitter dataset
RQ1: Identifying Political Campaigns with Lexical Mutations
Characterizing post similarity across languages
Extracting multilingual embeddings:
...and 22 more sections

Figures (7)

Figure 1: Examples of messages from an amplification campaign with lexical mutants. There are 932 more messages similar to this in different languages (English, Hindi, Punjabi, and Marathi), distributed by different Twitter accounts and Facebook groups.
Figure 2: Table describing dataset keywords, timeline, and the number of posts. Note that we collect only original tweets and posts, excluding retweets and reshares.
Figure 3: Examples of lexical mutants in an amplification campaign. Two messages (nodes) are connected together if they have high cosine similarity. A clique of nodes connected like this represents an amplification campaign with lexical mutants.
Figure 4: (a) Represents the distribution of the percentage of unique lexical variants in each campaign, calculated by removing all duplicated texts. (b) Shows the distribution of clique sizes in log scale. The average size of the clique is 56 messages. (c) Displays the KDE plot for pairwise cosine similarities between pairs of texts included in cliques. The two peaks of bimodal distribution represent our empirically determined cutoff of cosine similarity (left peak) and also the duplicated texts without lexical variations with the perfect cosine similarity (right peak).
Figure 5: Examples of account bio, pictures, and descriptions in the dataset
...and 2 more figures

Characterizing Political Campaigning with Lexical Mutants on Indian Social Media

TL;DR

Abstract

Characterizing Political Campaigning with Lexical Mutants on Indian Social Media

Authors

TL;DR

Abstract

Table of Contents

Figures (7)