Multi-Stage Retrieval for Operational Technology Cybersecurity Compliance Using Large Language Models: A Railway Casestudy

Regan Bolton; Mohammadreza Sheikhfathollahi; Simon Parkinson; Dan Basher; Howard Parkinson

Multi-Stage Retrieval for Operational Technology Cybersecurity Compliance Using Large Language Models: A Railway Casestudy

Regan Bolton, Mohammadreza Sheikhfathollahi, Simon Parkinson, Dan Basher, Howard Parkinson

TL;DR

The paper tackles the challenge of verifying Operational Technology Cybersecurity (OTCS) compliance in railways by introducing a multi-stage retrieval framework that leverages Large Language Models. It compares a Baseline Compliance Architecture (BCA) with a Parallel Compliance Architecture (PCA) that adds a context retriever and regulatory context from IEC 62443 and IEC 63452, showing PCA improves correctness and reasoning under OTCS questions. The evaluation combines human expert judgments and LLM-based judging across 44 questions, revealing that retrieval quality is a key determinant of accurate, coherent responses, with PCA offering notable gains over BCA but still facing retrieval bottlenecks. The study highlights the potential of retrieval-augmented approaches to enhance efficiency and accuracy in railway cybersecurity compliance, particularly in contexts with limited cybersecurity expertise, and outlines concrete avenues for improvement and extension.

Abstract

Operational Technology Cybersecurity (OTCS) continues to be a dominant challenge for critical infrastructure such as railways. As these systems become increasingly vulnerable to malicious attacks due to digitalization, effective documentation and compliance processes are essential to protect these safety-critical systems. This paper proposes a novel system that leverages Large Language Models (LLMs) and multi-stage retrieval to enhance the compliance verification process against standards like IEC 62443 and the rail-specific IEC 63452. We first evaluate a Baseline Compliance Architecture (BCA) for answering OTCS compliance queries, then develop an extended approach called Parallel Compliance Architecture (PCA) that incorporates additional context from regulatory standards. Through empirical evaluation comparing OpenAI-gpt-4o and Claude-3.5-haiku models in these architectures, we demonstrate that the PCA significantly improves both correctness and reasoning quality in compliance verification. Our research establishes metrics for response correctness, logical reasoning, and hallucination detection, highlighting the strengths and limitations of using LLMs for compliance verification in railway cybersecurity. The results suggest that retrieval-augmented approaches can significantly improve the efficiency and accuracy of compliance assessments, particularly valuable in an industry facing a shortage of cybersecurity expertise.

Multi-Stage Retrieval for Operational Technology Cybersecurity Compliance Using Large Language Models: A Railway Casestudy

TL;DR

Abstract

Multi-Stage Retrieval for Operational Technology Cybersecurity Compliance Using Large Language Models: A Railway Casestudy

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)