Table of Contents
Fetching ...

PRAGyan -- Connecting the Dots in Tweets

Rahul Ravi, Gouri Ginde, Jon Rokne

TL;DR

This work tackles the challenge of uncovering causal factors in social media discourse by integrating Knowledge Graphs (KGs) with Large Language Models (LLMs) through a Retrieval-Augmented Generation (RAG) framework. The authors implement PRAGyan, a Neo4j-backed KG augmented with Node2Vec embeddings and BERT-Uncased encodings to retrieve context that conditions GPT-3.5 Turbo prompts, enabling deeper causal reasoning than a baseline LLM without KG context. Quantitative results show higher BLEU, Jaccard, and cosine similarity scores for the KG-RAG approach, along with qualitative demonstrations of more informative and coherent causal explanations. The approach advances interpretability and actionable insights for time-sensitive social media analysis, with potential applicability across public health, policy, and crisis management domains.

Abstract

As social media platforms grow, understanding the underlying reasons behind events and statements becomes crucial for businesses, policymakers, and researchers. This research explores the integration of Knowledge Graphs (KGs) with Large Language Models (LLMs) to perform causal analysis of tweets dataset. The LLM aided analysis techniques often lack depth in uncovering the causes driving observed effects. By leveraging KGs and LLMs, which encode rich semantic relationships and temporal information, this study aims to uncover the complex interplay of factors influencing causal dynamics and compare the results obtained using GPT-3.5 Turbo. We employ a Retrieval-Augmented Generation (RAG) model, utilizing a KG stored in a Neo4j (a.k.a PRAGyan) data format, to retrieve relevant context for causal reasoning. Our approach demonstrates that the KG-enhanced LLM RAG can provide improved results when compared to the baseline LLM (GPT-3.5 Turbo) model as the source corpus increases in size. Our qualitative analysis highlights the advantages of combining KGs with LLMs for improved interpretability and actionable insights, facilitating informed decision-making across various domains. Whereas, quantitative analysis using metrics such as BLEU and cosine similarity show that our approach outperforms the baseline by 10\%.

PRAGyan -- Connecting the Dots in Tweets

TL;DR

This work tackles the challenge of uncovering causal factors in social media discourse by integrating Knowledge Graphs (KGs) with Large Language Models (LLMs) through a Retrieval-Augmented Generation (RAG) framework. The authors implement PRAGyan, a Neo4j-backed KG augmented with Node2Vec embeddings and BERT-Uncased encodings to retrieve context that conditions GPT-3.5 Turbo prompts, enabling deeper causal reasoning than a baseline LLM without KG context. Quantitative results show higher BLEU, Jaccard, and cosine similarity scores for the KG-RAG approach, along with qualitative demonstrations of more informative and coherent causal explanations. The approach advances interpretability and actionable insights for time-sensitive social media analysis, with potential applicability across public health, policy, and crisis management domains.

Abstract

As social media platforms grow, understanding the underlying reasons behind events and statements becomes crucial for businesses, policymakers, and researchers. This research explores the integration of Knowledge Graphs (KGs) with Large Language Models (LLMs) to perform causal analysis of tweets dataset. The LLM aided analysis techniques often lack depth in uncovering the causes driving observed effects. By leveraging KGs and LLMs, which encode rich semantic relationships and temporal information, this study aims to uncover the complex interplay of factors influencing causal dynamics and compare the results obtained using GPT-3.5 Turbo. We employ a Retrieval-Augmented Generation (RAG) model, utilizing a KG stored in a Neo4j (a.k.a PRAGyan) data format, to retrieve relevant context for causal reasoning. Our approach demonstrates that the KG-enhanced LLM RAG can provide improved results when compared to the baseline LLM (GPT-3.5 Turbo) model as the source corpus increases in size. Our qualitative analysis highlights the advantages of combining KGs with LLMs for improved interpretability and actionable insights, facilitating informed decision-making across various domains. Whereas, quantitative analysis using metrics such as BLEU and cosine similarity show that our approach outperforms the baseline by 10\%.
Paper Structure (35 sections, 2 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 35 sections, 2 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: Integrated Representation of KG and LLM through RAG
  • Figure 2: Combining Embeddings and Encodings during RAG to Identify Context
  • Figure 3: Overview of study design
  • Figure 4: Box Plot showing metrics between the Baseline and Proposed Models