Table of Contents
Fetching ...

Detecting Propaganda Techniques in Code-Switched Social Media Text

Muhammad Umar Salman, Asif Hanif, Shady Shehata, Preslav Nakov

TL;DR

This work creates a corpus of 1,030 texts code-switching between English and Roman Urdu, annotated with 20 propaganda techniques, and finds that it is important to model the multilinguality directly (rather than using translation) as well as to use the right fine-tuning strategy.

Abstract

Propaganda is a form of communication intended to influence the opinions and the mindset of the public to promote a particular agenda. With the rise of social media, propaganda has spread rapidly, leading to the need for automatic propaganda detection systems. Most work on propaganda detection has focused on high-resource languages, such as English, and little effort has been made to detect propaganda for low-resource languages. Yet, it is common to find a mix of multiple languages in social media communication, a phenomenon known as code-switching. Code-switching combines different languages within the same text, which poses a challenge for automatic systems. With this in mind, here we propose the novel task of detecting propaganda techniques in code-switched text. To support this task, we create a corpus of 1,030 texts code-switching between English and Roman Urdu, annotated with 20 propaganda techniques, which we make publicly available. We perform a number of experiments contrasting different experimental setups, and we find that it is important to model the multilinguality directly (rather than using translation) as well as to use the right fine-tuning strategy. The code and the dataset are publicly available at https://github.com/mbzuai-nlp/propaganda-codeswitched-text

Detecting Propaganda Techniques in Code-Switched Social Media Text

TL;DR

This work creates a corpus of 1,030 texts code-switching between English and Roman Urdu, annotated with 20 propaganda techniques, and finds that it is important to model the multilinguality directly (rather than using translation) as well as to use the right fine-tuning strategy.

Abstract

Propaganda is a form of communication intended to influence the opinions and the mindset of the public to promote a particular agenda. With the rise of social media, propaganda has spread rapidly, leading to the need for automatic propaganda detection systems. Most work on propaganda detection has focused on high-resource languages, such as English, and little effort has been made to detect propaganda for low-resource languages. Yet, it is common to find a mix of multiple languages in social media communication, a phenomenon known as code-switching. Code-switching combines different languages within the same text, which poses a challenge for automatic systems. With this in mind, here we propose the novel task of detecting propaganda techniques in code-switched text. To support this task, we create a corpus of 1,030 texts code-switching between English and Roman Urdu, annotated with 20 propaganda techniques, which we make publicly available. We perform a number of experiments contrasting different experimental setups, and we find that it is important to model the multilinguality directly (rather than using translation) as well as to use the right fine-tuning strategy. The code and the dataset are publicly available at https://github.com/mbzuai-nlp/propaganda-codeswitched-text
Paper Structure (18 sections, 5 figures, 6 tables)

This paper contains 18 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Example of code-switched text annotated at the fragment level. Here, the entire text is labeled as Smears and Black and White Fallacy/DictatorshipTranslation of the code-switched text:There are only two types of fans. Those toxic people who till death will not leave this rubbish team's side and people like me who realize this team is dung.
  • Figure 2: Methodological process adopted for annotating our code-switched dataset.
  • Figure 3: Histogram of the number of techniques per example.
  • Figure B.1: Interface design of our web-based annotation tool.
  • Figure B.2: The two-step annotation process of a text. Step-1: Select the span of text that needs to be annotated. Step-2: Click the boxes next to the corresponding propaganda label shown in Figure \ref{['figure:website-basic-example']}. The image on the right shows the updated JSON output after completing the two steps.