Evaluating the Effectiveness of Large Language Models in Automated News Article Summarization
Lionel Richy Panlap Houamegni, Fatih Gedikli
TL;DR
This work tackles the problem of scalable, automated news summarization for supply chain risk analysis by benchmarking a range of large language models under zero-shot, few-shot, and fine-tuned regimes. It adopts a mixed-methods methodology, using a dataset of $1{,}535$ articles across 28 industries and a seven-stage workflow to evaluate models with ROUGE, BLEU, and BERTScore, complemented by G-Eval and human judgments. The results show that modern LLMs, particularly Few-Shot GPT-4o mini and GPT-4o in Zero-Shot, deliver strong summary quality and risk identification, though performance and cost trade-offs vary across models and configurations. The study demonstrates the practical viability of LLM-driven risk monitoring for real-time supply chain resilience while highlighting limitations related to data size, biases, and the need for holistic evaluation approaches and domain-specific optimization.
Abstract
The automation of news analysis and summarization presents a promising solution to the challenge of processing and analyzing vast amounts of information prevalent in today's information society. Large Language Models (LLMs) have demonstrated the capability to transform vast amounts of textual data into concise and easily comprehensible summaries, offering an effective solution to the problem of information overload and providing users with a quick overview of relevant information. A particularly significant application of this technology lies in supply chain risk analysis. Companies must monitor the news about their suppliers and respond to incidents for several critical reasons, including compliance with laws and regulations, risk management, and maintaining supply chain resilience. This paper develops an automated news summarization system for supply chain risk analysis using LLMs. The proposed solution aggregates news from various sources, summarizes them using LLMs, and presents the condensed information to users in a clear and concise format. This approach enables companies to optimize their information processing and make informed decisions. Our study addresses two main research questions: (1) Are LLMs effective in automating news summarization, particularly in the context of supply chain risk analysis? (2) How effective are various LLMs in terms of readability, duplicate detection, and risk identification in their summarization quality? In this paper, we conducted an offline study using a range of publicly available LLMs at the time and complemented it with a user study focused on the top performing systems of the offline experiments to evaluate their effectiveness further. Our results demonstrate that LLMs, particularly Few-Shot GPT-4o mini, offer significant improvements in summary quality and risk identification.
