Table of Contents
Fetching ...

LinGO: A Linguistic Graph Optimization Framework with LLMs for Interpreting Intents of Online Uncivil Discourse

Yuan Zhang, Thales Bertaglia

TL;DR

LinGO addresses the challenge of interpreting incivility by decomposing complex semantic meaning into a multi-step linguistic graph-guided reasoning process for LLMs. It couples this structure with automated prompt/explanation optimization (e.g., TextGrad, AdalFlow, DSPy, RAG) to identify and improve the most error-prone steps, reducing misinterpretation of indirect expressions. On a Portuguese dataset of Brazilian political discourse, LinGO with RAG and Gemini achieves the top performance (accuracy $0.690$, wF1 $0.699$), outperforming zero-shot, CoT, direct optimization, and several open-source baselines. The work demonstrates that explicit, optimizable linguistic components can enhance interpretability and performance in complex semantic tasks, with potential extension to other nuanced language understanding problems.

Abstract

Detecting uncivil language is crucial for maintaining safe, inclusive, and democratic online spaces. Yet existing classifiers often misinterpret posts containing uncivil cues but expressing civil intents, leading to inflated estimates of harmful incivility online. We introduce LinGO, a linguistic graph optimization framework for large language models (LLMs) that leverages linguistic structures and optimization techniques to classify multi-class intents of incivility that use various direct and indirect expressions. LinGO decomposes language into multi-step linguistic components, identifies targeted steps that cause the most errors, and iteratively optimizes prompt and/or example components for targeted steps. We evaluate it using a dataset collected during the 2022 Brazilian presidential election, encompassing four forms of political incivility: Impoliteness (IMP), Hate Speech and Stereotyping (HSST), Physical Harm and Violent Political Rhetoric (PHAVPR), and Threats to Democratic Institutions and Values (THREAT). Each instance is annotated with six types of civil/uncivil intent. We benchmark LinGO using three cost-efficient LLMs: GPT-5-mini, Gemini 2.5 Flash-Lite, and Claude 3 Haiku, and four optimization techniques: TextGrad, AdalFlow, DSPy, and Retrieval-Augmented Generation (RAG). The results show that, across all models, LinGO consistently improves accuracy and weighted F1 compared with zero-shot, chain-of-thought, direct optimization, and fine-tuning baselines. RAG is the strongest optimization technique and, when paired with Gemini model, achieves the best overall performance. These findings demonstrate that incorporating multi-step linguistic components into LLM instructions and optimize targeted components can help the models explain complex semantic meanings, which can be extended to other complex semantic explanation tasks in the future.

LinGO: A Linguistic Graph Optimization Framework with LLMs for Interpreting Intents of Online Uncivil Discourse

TL;DR

LinGO addresses the challenge of interpreting incivility by decomposing complex semantic meaning into a multi-step linguistic graph-guided reasoning process for LLMs. It couples this structure with automated prompt/explanation optimization (e.g., TextGrad, AdalFlow, DSPy, RAG) to identify and improve the most error-prone steps, reducing misinterpretation of indirect expressions. On a Portuguese dataset of Brazilian political discourse, LinGO with RAG and Gemini achieves the top performance (accuracy , wF1 ), outperforming zero-shot, CoT, direct optimization, and several open-source baselines. The work demonstrates that explicit, optimizable linguistic components can enhance interpretability and performance in complex semantic tasks, with potential extension to other nuanced language understanding problems.

Abstract

Detecting uncivil language is crucial for maintaining safe, inclusive, and democratic online spaces. Yet existing classifiers often misinterpret posts containing uncivil cues but expressing civil intents, leading to inflated estimates of harmful incivility online. We introduce LinGO, a linguistic graph optimization framework for large language models (LLMs) that leverages linguistic structures and optimization techniques to classify multi-class intents of incivility that use various direct and indirect expressions. LinGO decomposes language into multi-step linguistic components, identifies targeted steps that cause the most errors, and iteratively optimizes prompt and/or example components for targeted steps. We evaluate it using a dataset collected during the 2022 Brazilian presidential election, encompassing four forms of political incivility: Impoliteness (IMP), Hate Speech and Stereotyping (HSST), Physical Harm and Violent Political Rhetoric (PHAVPR), and Threats to Democratic Institutions and Values (THREAT). Each instance is annotated with six types of civil/uncivil intent. We benchmark LinGO using three cost-efficient LLMs: GPT-5-mini, Gemini 2.5 Flash-Lite, and Claude 3 Haiku, and four optimization techniques: TextGrad, AdalFlow, DSPy, and Retrieval-Augmented Generation (RAG). The results show that, across all models, LinGO consistently improves accuracy and weighted F1 compared with zero-shot, chain-of-thought, direct optimization, and fine-tuning baselines. RAG is the strongest optimization technique and, when paired with Gemini model, achieves the best overall performance. These findings demonstrate that incorporating multi-step linguistic components into LLM instructions and optimize targeted components can help the models explain complex semantic meanings, which can be extended to other complex semantic explanation tasks in the future.
Paper Structure (31 sections, 12 equations, 6 figures, 6 tables, 2 algorithms)

This paper contains 31 sections, 12 equations, 6 figures, 6 tables, 2 algorithms.

Figures (6)

  • Figure 1: Illustration of six intent categories of hate speech. [1] is a direct expression of hate speech. [2]--[6] are indirect expressions of hate speech.
  • Figure 2: Demonstration of pipeline of Linguistic Graph Optimization (LinGO).
  • Figure 3: Comparison of LinGO and baseline prompting methods across intent labels (0-6) and models (GPT-5-mini, Claude 3 Haiku, Gemini 2.5 Flash–Lite).
  • Figure 4: Comparison of LinGO and baseline prompting methods across forms of incivility (IMP, HSST, PHAVPR, THREAT) and models (GPT-5-mini, Claude 3 Haiku, Gemini 2.5 Flash–Lite).
  • Figure 5: Distribution of intent labels in the development and test sets. The chi-square test shows that the differences between their distributions are not statistically significant.
  • ...and 1 more figures