Table of Contents
Fetching ...

AuditCopilot: Leveraging LLMs for Fraud Detection in Double-Entry Bookkeeping

Md Abdul Kadir, Sai Suresh Macharla Vasu, Sidharth S. Nair, Daniel Sonntag

TL;DR

Journal Entry Tests (JETs) struggle with false positives in tax-related ledgers. The authors introduce AuditCopilot, a prompt-tuned LLM-based detector that ingests heterogeneous JE fields and outputs anomaly flags plus natural-language explanations, augmented by contextual statistics and an Isolation Forest cue. Evaluations on synthetic and anonymized datasets show LLMs can match or exceed rule-based JETs and classical ML baselines while delivering interpretable rationales, illustrating a viable path for AI-augmented auditing. The work emphasizes governance and human-in-the-loop requirements for safe deployment in high-stakes financial auditing.

Abstract

Auditors rely on Journal Entry Tests (JETs) to detect anomalies in tax-related ledger records, but rule-based methods generate overwhelming false positives and struggle with subtle irregularities. We investigate whether large language models (LLMs) can serve as anomaly detectors in double-entry bookkeeping. Benchmarking SoTA LLMs such as LLaMA and Gemma on both synthetic and real-world anonymized ledgers, we compare them against JETs and machine learning baselines. Our results show that LLMs consistently outperform traditional rule-based JETs and classical ML baselines, while also providing natural-language explanations that enhance interpretability. These results highlight the potential of \textbf{AI-augmented auditing}, where human auditors collaborate with foundation models to strengthen financial integrity.

AuditCopilot: Leveraging LLMs for Fraud Detection in Double-Entry Bookkeeping

TL;DR

Journal Entry Tests (JETs) struggle with false positives in tax-related ledgers. The authors introduce AuditCopilot, a prompt-tuned LLM-based detector that ingests heterogeneous JE fields and outputs anomaly flags plus natural-language explanations, augmented by contextual statistics and an Isolation Forest cue. Evaluations on synthetic and anonymized datasets show LLMs can match or exceed rule-based JETs and classical ML baselines while delivering interpretable rationales, illustrating a viable path for AI-augmented auditing. The work emphasizes governance and human-in-the-loop requirements for safe deployment in high-stakes financial auditing.

Abstract

Auditors rely on Journal Entry Tests (JETs) to detect anomalies in tax-related ledger records, but rule-based methods generate overwhelming false positives and struggle with subtle irregularities. We investigate whether large language models (LLMs) can serve as anomaly detectors in double-entry bookkeeping. Benchmarking SoTA LLMs such as LLaMA and Gemma on both synthetic and real-world anonymized ledgers, we compare them against JETs and machine learning baselines. Our results show that LLMs consistently outperform traditional rule-based JETs and classical ML baselines, while also providing natural-language explanations that enhance interpretability. These results highlight the potential of \textbf{AI-augmented auditing}, where human auditors collaborate with foundation models to strengthen financial integrity.

Paper Structure

This paper contains 20 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: AuditCopilot prompt template with dataset statistics and Isolation Forest hints
  • Figure 2: Synthetic dataset prompt template with engineered flags and rule-based decision criteria.