Table of Contents
Fetching ...

AraFinNews: Arabic Financial Summarisation with Domain-Adapted LLMs

Mo El-Haj, Paul Rayson

TL;DR

AraFinNews provides the largest open Arabic financial news corpus (212,512 article–headline pairs across 2015–2025) to support domain-specific abstractive summarisation. The study compares multilingual and Arabic-domain models (mT5, AraT5, FinAraT5) within a unified pipeline, showing that domain-adapted FinAraT5 yields clearer, more numerically reliable headlines than general-purpose baselines. The results underscore the value of targeted pretraining on financial Arabic data for improved fluency, accuracy, and stylistic alignment with professional journalism. Together, AraFinNews and FinAraT5 establish a practical, high-impact benchmark for advancing Arabic financial NLP and related tasks like NER and event extraction.

Abstract

We introduce AraFinNews, the largest publicly available Arabic financial news dataset to date, comprising 212,500 article-headline pairs spanning a decade of reporting from 2015 to 2025. Designed as an Arabic counterpart to major English summarisation corpora such as CNN/DailyMail, AraFinNews provides a realistic benchmark for evaluating domain-specific language understanding and generation in financial contexts. Using this resource, we investigate the impact of domain specificity on abstractive summarisation of Arabic financial texts with large language models (LLMs). In particular, we evaluate transformer-based models: mT5, AraT5, and the domain-adapted FinAraT5 to examine how financial-domain pretraining influences accuracy, numerical reliability, and stylistic alignment with professional reporting. Experimental results show that domain-adapted models generate more coherent summaries, especially in their handling of quantitative and entity-centric information. These findings highlight the importance of domain-specific adaptation for improving narrative fluency in Arabic financial summarisation. The dataset is freely available for non-commercial research at https://github.com/ArabicNLP-uk/AraFinNews.

AraFinNews: Arabic Financial Summarisation with Domain-Adapted LLMs

TL;DR

AraFinNews provides the largest open Arabic financial news corpus (212,512 article–headline pairs across 2015–2025) to support domain-specific abstractive summarisation. The study compares multilingual and Arabic-domain models (mT5, AraT5, FinAraT5) within a unified pipeline, showing that domain-adapted FinAraT5 yields clearer, more numerically reliable headlines than general-purpose baselines. The results underscore the value of targeted pretraining on financial Arabic data for improved fluency, accuracy, and stylistic alignment with professional journalism. Together, AraFinNews and FinAraT5 establish a practical, high-impact benchmark for advancing Arabic financial NLP and related tasks like NER and event extraction.

Abstract

We introduce AraFinNews, the largest publicly available Arabic financial news dataset to date, comprising 212,500 article-headline pairs spanning a decade of reporting from 2015 to 2025. Designed as an Arabic counterpart to major English summarisation corpora such as CNN/DailyMail, AraFinNews provides a realistic benchmark for evaluating domain-specific language understanding and generation in financial contexts. Using this resource, we investigate the impact of domain specificity on abstractive summarisation of Arabic financial texts with large language models (LLMs). In particular, we evaluate transformer-based models: mT5, AraT5, and the domain-adapted FinAraT5 to examine how financial-domain pretraining influences accuracy, numerical reliability, and stylistic alignment with professional reporting. Experimental results show that domain-adapted models generate more coherent summaries, especially in their handling of quantitative and entity-centric information. These findings highlight the importance of domain-specific adaptation for improving narrative fluency in Arabic financial summarisation. The dataset is freely available for non-commercial research at https://github.com/ArabicNLP-uk/AraFinNews.

Paper Structure

This paper contains 19 sections, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Arabic example record from AraFinNews
  • Figure 2: English translation of the Arabic example record from AraFinNews
  • Figure 3: Arabic financial headline generation using FinAraT5.
  • Figure 4: Span-corruption pretraining task used in FinAraT5.