Table of Contents
Fetching ...

An Agentic LLM Framework for Adverse Media Screening in AML Compliance

Pavel Chernakov, Sasan Jafarnejad, Raphaël Frank

TL;DR

This work presents an agentic system that leverages Large Language Models with Retrieval-Augmented Generation to automate adverse media screening and evaluates the approach using multiple LLM backends on a dataset comprising Politically Exposed Persons (PEPs), persons from regulatory watchlists, and sanctioned persons from OpenSanctions.

Abstract

Adverse media screening is a critical component of anti-money laundering (AML) and know-your-customer (KYC) compliance processes in financial institutions. Traditional approaches rely on keyword-based searches that generate high false-positive rates or require extensive manual review. We present an agentic system that leverages Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) to automate adverse media screening. Our system implements a multi-step approach where an LLM agent searches the web, retrieves and processes relevant documents, and computes an Adverse Media Index (AMI) score for each subject. We evaluate our approach using multiple LLM backends on a dataset comprising Politically Exposed Persons (PEPs), persons from regulatory watchlists, and sanctioned persons from OpenSanctions and clean names from academic sources, demonstrating the system's ability to distinguish between high-risk and low-risk individuals.

An Agentic LLM Framework for Adverse Media Screening in AML Compliance

TL;DR

This work presents an agentic system that leverages Large Language Models with Retrieval-Augmented Generation to automate adverse media screening and evaluates the approach using multiple LLM backends on a dataset comprising Politically Exposed Persons (PEPs), persons from regulatory watchlists, and sanctioned persons from OpenSanctions.

Abstract

Adverse media screening is a critical component of anti-money laundering (AML) and know-your-customer (KYC) compliance processes in financial institutions. Traditional approaches rely on keyword-based searches that generate high false-positive rates or require extensive manual review. We present an agentic system that leverages Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) to automate adverse media screening. Our system implements a multi-step approach where an LLM agent searches the web, retrieves and processes relevant documents, and computes an Adverse Media Index (AMI) score for each subject. We evaluate our approach using multiple LLM backends on a dataset comprising Politically Exposed Persons (PEPs), persons from regulatory watchlists, and sanctioned persons from OpenSanctions and clean names from academic sources, demonstrating the system's ability to distinguish between high-risk and low-risk individuals.
Paper Structure (32 sections, 2 figures, 1 table)

This paper contains 32 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: System architecture showing the agentic adverse media screening pipeline. The pipeline processes an identity through five stages: (1) Search Engine queries web APIs for relevant URLs, (2) Web Crawler retrieves page content, (3) Document Processor chunks text and creates embeddings stored in a vector store, (4) LLM Agent executes configurable playbooks using RAG-based question answering, and (5) Verdict Generator synthesizes evidence into a final AMI score with justification.
  • Figure 2: Empirical Cumulative Distribution Functions (ECDFs) of AMI scores across four populations (Clean, PEP, RW, SDN) for each LLM backend. The clear separation between curves demonstrates the system's ability to discriminate between low-risk and high-risk individuals.