Table of Contents
Fetching ...

Evaluating the Reliability of Digital Forensic Evidence Discovered by Large Language Model: A Case Study

Jeel Piyushkumar Khatiwala, Daniel Kwaku Ntiamoah Addai, Weifeng Xu

TL;DR

A structured framework that automates forensic artifact extraction, refines data through LLM-driven analysis, and validates results using a Digital Forensic Knowledge Graph (DFKG) is proposed, guaranteeing artifact traceability and evidentiary consistency through deterministic Unique Identifiers (UIDs) and forensic cross-referencing.

Abstract

The growing reliance on AI-identified digital evidence raises significant concerns about its reliability, particularly as large language models (LLMs) are increasingly integrated into forensic investigations. This paper proposes a structured framework that automates forensic artifact extraction, refines data through LLM-driven analysis, and validates results using a Digital Forensic Knowledge Graph (DFKG). Evaluated on a 13 GB forensic image dataset containing 61 applications, 2,864 databases, and 5,870 tables, the framework ensures artifact traceability and evidentiary consistency through deterministic Unique Identifiers (UIDs) and forensic cross-referencing. We propose this methodology to address challenges in ensuring the credibility and forensic integrity of AI-identified evidence, reducing classification errors, and advancing scalable, auditable methodologies. A comprehensive case study on this dataset demonstrates the framework's effectiveness, achieving over 95 percent accuracy in artifact extraction, strong support of chain-of-custody adherence, and robust contextual consistency in forensic relationships. Key results validate the framework's ability to enhance reliability, reduce errors, and establish a legally sound paradigm for AI-assisted digital forensics.

Evaluating the Reliability of Digital Forensic Evidence Discovered by Large Language Model: A Case Study

TL;DR

A structured framework that automates forensic artifact extraction, refines data through LLM-driven analysis, and validates results using a Digital Forensic Knowledge Graph (DFKG) is proposed, guaranteeing artifact traceability and evidentiary consistency through deterministic Unique Identifiers (UIDs) and forensic cross-referencing.

Abstract

The growing reliance on AI-identified digital evidence raises significant concerns about its reliability, particularly as large language models (LLMs) are increasingly integrated into forensic investigations. This paper proposes a structured framework that automates forensic artifact extraction, refines data through LLM-driven analysis, and validates results using a Digital Forensic Knowledge Graph (DFKG). Evaluated on a 13 GB forensic image dataset containing 61 applications, 2,864 databases, and 5,870 tables, the framework ensures artifact traceability and evidentiary consistency through deterministic Unique Identifiers (UIDs) and forensic cross-referencing. We propose this methodology to address challenges in ensuring the credibility and forensic integrity of AI-identified evidence, reducing classification errors, and advancing scalable, auditable methodologies. A comprehensive case study on this dataset demonstrates the framework's effectiveness, achieving over 95 percent accuracy in artifact extraction, strong support of chain-of-custody adherence, and robust contextual consistency in forensic relationships. Key results validate the framework's ability to enhance reliability, reduce errors, and establish a legally sound paradigm for AI-assisted digital forensics.
Paper Structure (23 sections, 17 equations, 4 figures, 2 tables)

This paper contains 23 sections, 17 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Proposed evaluation workflow: identifying artifacts containing digital evidence, transforming the digital evidence repository to an LLM-readable form, constructing the Digital Forensic Knowledge Graph (DFKG), and integrating LLM-refined evidence into the final graph representation.
  • Figure 2: Visualization of the DFKG with Artifact Relationships
  • Figure 3: Invalid hypothesis for Timestamps and Application Names.
  • Figure 4: Invalid hypothesis for Email and Application Names.