Table of Contents
Fetching ...

Evaluating LLMs Capabilities Towards Understanding Social Dynamics

Anique Tahir, Lu Cheng, Manuel Sandoval, Yasin N. Silva, Deborah L. Hall, Huan Liu

TL;DR

It is found that while fine-tuned LLMs exhibit promising results in some social media understanding tasks (understanding directionality), they presented mixed results in others (proper paraphrasing and bullying/anti-bullying detection).

Abstract

Social media discourse involves people from different backgrounds, beliefs, and motives. Thus, often such discourse can devolve into toxic interactions. Generative Models, such as Llama and ChatGPT, have recently exploded in popularity due to their capabilities in zero-shot question-answering. Because these models are increasingly being used to ask questions of social significance, a crucial research question is whether they can understand social media dynamics. This work provides a critical analysis regarding generative LLM's ability to understand language and dynamics in social contexts, particularly considering cyberbullying and anti-cyberbullying (posts aimed at reducing cyberbullying) interactions. Specifically, we compare and contrast the capabilities of different large language models (LLMs) to understand three key aspects of social dynamics: language, directionality, and the occurrence of bullying/anti-bullying messages. We found that while fine-tuned LLMs exhibit promising results in some social media understanding tasks (understanding directionality), they presented mixed results in others (proper paraphrasing and bullying/anti-bullying detection). We also found that fine-tuning and prompt engineering mechanisms can have positive effects in some tasks. We believe that a understanding of LLM's capabilities is crucial to design future models that can be effectively used in social applications.

Evaluating LLMs Capabilities Towards Understanding Social Dynamics

TL;DR

It is found that while fine-tuned LLMs exhibit promising results in some social media understanding tasks (understanding directionality), they presented mixed results in others (proper paraphrasing and bullying/anti-bullying detection).

Abstract

Social media discourse involves people from different backgrounds, beliefs, and motives. Thus, often such discourse can devolve into toxic interactions. Generative Models, such as Llama and ChatGPT, have recently exploded in popularity due to their capabilities in zero-shot question-answering. Because these models are increasingly being used to ask questions of social significance, a crucial research question is whether they can understand social media dynamics. This work provides a critical analysis regarding generative LLM's ability to understand language and dynamics in social contexts, particularly considering cyberbullying and anti-cyberbullying (posts aimed at reducing cyberbullying) interactions. Specifically, we compare and contrast the capabilities of different large language models (LLMs) to understand three key aspects of social dynamics: language, directionality, and the occurrence of bullying/anti-bullying messages. We found that while fine-tuned LLMs exhibit promising results in some social media understanding tasks (understanding directionality), they presented mixed results in others (proper paraphrasing and bullying/anti-bullying detection). We also found that fine-tuning and prompt engineering mechanisms can have positive effects in some tasks. We believe that a understanding of LLM's capabilities is crucial to design future models that can be effectively used in social applications.

Paper Structure

This paper contains 27 sections, 1 equation, 3 figures, 3 tables.

Figures (3)

  • Figure 1: We use constrained generation for our analysis. Constrained prompt generation allows extraction of categorical labels and explanations while preventing the responses from deviating from expectation. The prompt placeholders are highlighted in yellow.
  • Figure 2: Semantic similarity against edit distance for paraphrases. ChatGPT shows a promising balance, while Llama-based models tend to repeat the text verbatim. Using exemplars shows performance improvements for the Llama-2 7B model.
  • Figure 3: We divide our fine-tuning analysis in two phases. The first phase adds structural understanding, while the second phase adds social understanding to the under-privileged models.