Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach
Rian Dolphin, Joe Dursun, Jonathan Chow, Jarrett Blankenship, Katie Adams, Quinton Pike
TL;DR
This work addresses the challenge of extracting structured insights from unstructured financial news by deploying an LLM-driven pipeline that jointly performs ticker extraction, per-company sentiment, and article summarization without relying on pre-tagged feeds. A hybrid validation framework combines a regularly updated company-ticker mapping database with a tailored string-similarity and post-processing system to ensure accurate mappings, augmented by chain-of-thought prompting for sentiment analysis. Key contributions include achieving high ticker coverage (about 90% with no missed tickers and around 22% with additional tickers) and enabling per-ticker sentiment, all delivered via a live API updated hourly, plus a static dataset for reproducibility. The approach broadens usable sources by over 400%, enhances data depth with granular sentiment, and provides a scalable data product that supports research and market analytics when integrated with other data sources.
Abstract
Financial news plays a crucial role in decision-making processes across the financial sector, yet the efficient processing of this information into a structured format remains challenging. This paper presents a novel approach to financial news processing that leverages Large Language Models (LLMs) to overcome limitations that previously prevented the extraction of structured data from unstructured financial news. We introduce a system that extracts relevant company tickers from raw news article content, performs sentiment analysis at the company level, and generates summaries, all without relying on pre-structured data feeds. Our methodology combines the generative capabilities of LLMs, and recent prompting techniques, with a robust validation framework that uses a tailored string similarity approach. Evaluation on a dataset of 5530 financial news articles demonstrates the effectiveness of our approach, with 90% of articles not missing any tickers compared with current data providers, and 22% of articles having additional relevant tickers. In addition to this paper, the methodology has been implemented at scale with the resulting processed data made available through a live API endpoint, which is updated in real-time with the latest news. To the best of our knowledge, we are the first data provider to offer granular, per-company sentiment analysis from news articles, enhancing the depth of information available to market participants. We also release the evaluation dataset of 5530 processed articles as a static file, which we hope will facilitate further research leveraging financial news.
