Table of Contents
Fetching ...

In-Context Learning for Long-Context Sentiment Analysis on Infrastructure Project Opinions

Alireza Shamshiri, Kyeong Rok Ryu, June Young Park

TL;DR

This study evaluates the performance of three leading LLMs: GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro on lengthy, complex, and opinion-varying documents concerning infrastructure projects, under both zero-shot and few-shot scenarios.

Abstract

Large language models (LLMs) have achieved impressive results across various tasks. However, they still struggle with long-context documents. This study evaluates the performance of three leading LLMs: GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro on lengthy, complex, and opinion-varying documents concerning infrastructure projects, under both zero-shot and few-shot scenarios. Our results indicate that GPT-4o excels in zero-shot scenarios for simpler, shorter documents, while Claude 3.5 Sonnet surpasses GPT-4o in handling more complex, sentiment-fluctuating opinions. In few-shot scenarios, Claude 3.5 Sonnet outperforms overall, while GPT-4o shows greater stability as the number of demonstrations increases.

In-Context Learning for Long-Context Sentiment Analysis on Infrastructure Project Opinions

TL;DR

This study evaluates the performance of three leading LLMs: GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro on lengthy, complex, and opinion-varying documents concerning infrastructure projects, under both zero-shot and few-shot scenarios.

Abstract

Large language models (LLMs) have achieved impressive results across various tasks. However, they still struggle with long-context documents. This study evaluates the performance of three leading LLMs: GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro on lengthy, complex, and opinion-varying documents concerning infrastructure projects, under both zero-shot and few-shot scenarios. Our results indicate that GPT-4o excels in zero-shot scenarios for simpler, shorter documents, while Claude 3.5 Sonnet surpasses GPT-4o in handling more complex, sentiment-fluctuating opinions. In few-shot scenarios, Claude 3.5 Sonnet outperforms overall, while GPT-4o shows greater stability as the number of demonstrations increases.

Paper Structure

This paper contains 12 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Performance comparison of zero-shot results. The average is calculated based on each model's performance across all four datasets
  • Figure 2: Performance comparison of few-shot results, with average outcomes calculated to provide deeper insights.
  • Figure 3: An illustration of the zero-shot prompt along with an answer generated by a LLM model.
  • Figure 4: An illustration of the 3-shot prompt along with associated sentiment labels provided to a model prompt.