Evaluating the Reliability of Digital Forensic Evidence Discovered by Large Language Model: A Case Study

Jeel Piyushkumar Khatiwala; Daniel Kwaku Ntiamoah Addai; Weifeng Xu

Evaluating the Reliability of Digital Forensic Evidence Discovered by Large Language Model: A Case Study

Jeel Piyushkumar Khatiwala, Daniel Kwaku Ntiamoah Addai, Weifeng Xu

TL;DR

A structured framework that automates forensic artifact extraction, refines data through LLM-driven analysis, and validates results using a Digital Forensic Knowledge Graph (DFKG) is proposed, guaranteeing artifact traceability and evidentiary consistency through deterministic Unique Identifiers (UIDs) and forensic cross-referencing.

Abstract

The growing reliance on AI-identified digital evidence raises significant concerns about its reliability, particularly as large language models (LLMs) are increasingly integrated into forensic investigations. This paper proposes a structured framework that automates forensic artifact extraction, refines data through LLM-driven analysis, and validates results using a Digital Forensic Knowledge Graph (DFKG). Evaluated on a 13 GB forensic image dataset containing 61 applications, 2,864 databases, and 5,870 tables, the framework ensures artifact traceability and evidentiary consistency through deterministic Unique Identifiers (UIDs) and forensic cross-referencing. We propose this methodology to address challenges in ensuring the credibility and forensic integrity of AI-identified evidence, reducing classification errors, and advancing scalable, auditable methodologies. A comprehensive case study on this dataset demonstrates the framework's effectiveness, achieving over 95 percent accuracy in artifact extraction, strong support of chain-of-custody adherence, and robust contextual consistency in forensic relationships. Key results validate the framework's ability to enhance reliability, reduce errors, and establish a legally sound paradigm for AI-assisted digital forensics.

Evaluating the Reliability of Digital Forensic Evidence Discovered by Large Language Model: A Case Study

TL;DR

Abstract

Paper Structure (23 sections, 17 equations, 4 figures, 2 tables)

This paper contains 23 sections, 17 equations, 4 figures, 2 tables.

Introduction
Related Work
Proposed Methodology
Data Extraction and Preprocessing
Artifact Identification
Data Transformation
Mathematical Formulations
Unique Identifier Generation (UID)
Example
CSV Transformation for LLM Processing
Example
LLM-Assisted Artifact Refinement
Example
Knowledge Graph Construction
Evaluation Metrics
...and 8 more sections

Figures (4)

Figure 1: Proposed evaluation workflow: identifying artifacts containing digital evidence, transforming the digital evidence repository to an LLM-readable form, constructing the Digital Forensic Knowledge Graph (DFKG), and integrating LLM-refined evidence into the final graph representation.
Figure 2: Visualization of the DFKG with Artifact Relationships
Figure 3: Invalid hypothesis for Timestamps and Application Names.
Figure 4: Invalid hypothesis for Email and Application Names.

Evaluating the Reliability of Digital Forensic Evidence Discovered by Large Language Model: A Case Study

TL;DR

Abstract

Evaluating the Reliability of Digital Forensic Evidence Discovered by Large Language Model: A Case Study

Authors

TL;DR

Abstract

Table of Contents

Figures (4)