Table of Contents
Fetching ...

Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach

Rian Dolphin, Joe Dursun, Jonathan Chow, Jarrett Blankenship, Katie Adams, Quinton Pike

TL;DR

This work addresses the challenge of extracting structured insights from unstructured financial news by deploying an LLM-driven pipeline that jointly performs ticker extraction, per-company sentiment, and article summarization without relying on pre-tagged feeds. A hybrid validation framework combines a regularly updated company-ticker mapping database with a tailored string-similarity and post-processing system to ensure accurate mappings, augmented by chain-of-thought prompting for sentiment analysis. Key contributions include achieving high ticker coverage (about 90% with no missed tickers and around 22% with additional tickers) and enabling per-ticker sentiment, all delivered via a live API updated hourly, plus a static dataset for reproducibility. The approach broadens usable sources by over 400%, enhances data depth with granular sentiment, and provides a scalable data product that supports research and market analytics when integrated with other data sources.

Abstract

Financial news plays a crucial role in decision-making processes across the financial sector, yet the efficient processing of this information into a structured format remains challenging. This paper presents a novel approach to financial news processing that leverages Large Language Models (LLMs) to overcome limitations that previously prevented the extraction of structured data from unstructured financial news. We introduce a system that extracts relevant company tickers from raw news article content, performs sentiment analysis at the company level, and generates summaries, all without relying on pre-structured data feeds. Our methodology combines the generative capabilities of LLMs, and recent prompting techniques, with a robust validation framework that uses a tailored string similarity approach. Evaluation on a dataset of 5530 financial news articles demonstrates the effectiveness of our approach, with 90% of articles not missing any tickers compared with current data providers, and 22% of articles having additional relevant tickers. In addition to this paper, the methodology has been implemented at scale with the resulting processed data made available through a live API endpoint, which is updated in real-time with the latest news. To the best of our knowledge, we are the first data provider to offer granular, per-company sentiment analysis from news articles, enhancing the depth of information available to market participants. We also release the evaluation dataset of 5530 processed articles as a static file, which we hope will facilitate further research leveraging financial news.

Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach

TL;DR

This work addresses the challenge of extracting structured insights from unstructured financial news by deploying an LLM-driven pipeline that jointly performs ticker extraction, per-company sentiment, and article summarization without relying on pre-tagged feeds. A hybrid validation framework combines a regularly updated company-ticker mapping database with a tailored string-similarity and post-processing system to ensure accurate mappings, augmented by chain-of-thought prompting for sentiment analysis. Key contributions include achieving high ticker coverage (about 90% with no missed tickers and around 22% with additional tickers) and enabling per-ticker sentiment, all delivered via a live API updated hourly, plus a static dataset for reproducibility. The approach broadens usable sources by over 400%, enhances data depth with granular sentiment, and provides a scalable data product that supports research and market analytics when integrated with other data sources.

Abstract

Financial news plays a crucial role in decision-making processes across the financial sector, yet the efficient processing of this information into a structured format remains challenging. This paper presents a novel approach to financial news processing that leverages Large Language Models (LLMs) to overcome limitations that previously prevented the extraction of structured data from unstructured financial news. We introduce a system that extracts relevant company tickers from raw news article content, performs sentiment analysis at the company level, and generates summaries, all without relying on pre-structured data feeds. Our methodology combines the generative capabilities of LLMs, and recent prompting techniques, with a robust validation framework that uses a tailored string similarity approach. Evaluation on a dataset of 5530 financial news articles demonstrates the effectiveness of our approach, with 90% of articles not missing any tickers compared with current data providers, and 22% of articles having additional relevant tickers. In addition to this paper, the methodology has been implemented at scale with the resulting processed data made available through a live API endpoint, which is updated in real-time with the latest news. To the best of our knowledge, we are the first data provider to offer granular, per-company sentiment analysis from news articles, enhancing the depth of information available to market participants. We also release the evaluation dataset of 5530 processed articles as a static file, which we hope will facilitate further research leveraging financial news.
Paper Structure (10 sections, 6 figures, 1 algorithm)

This paper contains 10 sections, 6 figures, 1 algorithm.

Figures (6)

  • Figure 1: Example article from the Google News live feed.
  • Figure 2: Initial prompt and output example showing the extraction of structured data from an article.
  • Figure 3: Example of the final output accessible to users via API.
  • Figure 4: Distribution of number of tickers per article.
  • Figure 5: Distribution of the number of missing tickers in articles compared with news provider labelling.
  • ...and 1 more figures