Table of Contents
Fetching ...

Strengthening Human-Centric Chain-of-Thought Reasoning Integrity in LLMs via a Structured Prompt Framework

Jiling Zhou, Aisvarya Adeseye, Seppo Virtanen, Antti Hakkala, Jouni Isoaho

Abstract

Chain-of-Thought (CoT) prompting has been used to enhance the reasoning capability of LLMs. However, its reliability in security-sensitive analytical tasks remains insufficiently examined, particularly under structured human evaluation. Alternative approaches, such as model scaling and fine-tuning can be used to help improve performance. These methods are also often costly, computationally intensive, or difficult to audit. In contrast, prompt engineering provides a lightweight, transparent, and controllable mechanism for guiding LLM reasoning. This study proposes a structured prompt engineering framework designed to strengthen CoT reasoning integrity while improving security threat and attack detection reliability in local LLM deployments. The framework includes 16 factors grouped into four core dimensions: (1) Context and Scope Control, (2) Evidence Grounding and Traceability, (3) Reasoning Structure and Cognitive Control, and (4) Security-Specific Analytical Constraints. Rather than optimizing the wording of the prompt heuristically, the framework introduces explicit reasoning controls to mitigate hallucination and prevent reasoning drift, as well as strengthening interpretability in security-sensitive contexts. Using DDoS attack detection in SDN traffic as a case study, multiple model families were evaluated under structured and unstructured prompting conditions. Pareto frontier analysis and ablation experiments demonstrate consistent reasoning improvements (up to 40% in smaller models) and stable accuracy gains across scales. Human evaluation with strong inter-rater agreement (Cohen's k > 0.80) confirms robustness. The results establish structured prompting as an effective and practical approach for reliable and explainable AI-driven cybersecurity analysis.

Strengthening Human-Centric Chain-of-Thought Reasoning Integrity in LLMs via a Structured Prompt Framework

Abstract

Chain-of-Thought (CoT) prompting has been used to enhance the reasoning capability of LLMs. However, its reliability in security-sensitive analytical tasks remains insufficiently examined, particularly under structured human evaluation. Alternative approaches, such as model scaling and fine-tuning can be used to help improve performance. These methods are also often costly, computationally intensive, or difficult to audit. In contrast, prompt engineering provides a lightweight, transparent, and controllable mechanism for guiding LLM reasoning. This study proposes a structured prompt engineering framework designed to strengthen CoT reasoning integrity while improving security threat and attack detection reliability in local LLM deployments. The framework includes 16 factors grouped into four core dimensions: (1) Context and Scope Control, (2) Evidence Grounding and Traceability, (3) Reasoning Structure and Cognitive Control, and (4) Security-Specific Analytical Constraints. Rather than optimizing the wording of the prompt heuristically, the framework introduces explicit reasoning controls to mitigate hallucination and prevent reasoning drift, as well as strengthening interpretability in security-sensitive contexts. Using DDoS attack detection in SDN traffic as a case study, multiple model families were evaluated under structured and unstructured prompting conditions. Pareto frontier analysis and ablation experiments demonstrate consistent reasoning improvements (up to 40% in smaller models) and stable accuracy gains across scales. Human evaluation with strong inter-rater agreement (Cohen's k > 0.80) confirms robustness. The results establish structured prompting as an effective and practical approach for reliable and explainable AI-driven cybersecurity analysis.

Paper Structure

This paper contains 14 sections, 5 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Three types of CoT prompts:(1)Free CoT Prompt: The step-by-step reasoning approach with minimal structural constraints emphasizes flexibility; (2)Evidence-Locked CoT Prompt: Incorporating consistency constraints into the reasoning process, requiring conclusions to be based on given information, increasing credibility and reducing hallucinations; (3)Structured Security Reasoning Prompt: The reasoning process follows the actual workflow of security analysis (e.g., threat detection → risk analysis → action recommendation) to ensure the model output is consistent with the decision-making logic of human security experts.
  • Figure 2: The Prompt Engineering Framework for CoT Reasoning: organizes 16 reasoning-control factors (F1–F16) across 4 dimensions to reduce hallucination, prevent reasoning drift, and strengthen interpretability in security-sensitive tasks; Output Control: System-level factors (S) guide global reasoning, structure, uncertainty calibration, and verification; User-level factors (U) ensure feature grounding, taxonomy alignment, and analytical discipline.
  • Figure 3: Experimental Methodology and Prompting Workflow. This figure shows the experimental workflow. The accuracy of the detection was directly compared with the dataset, while the reasoning part was manually verified by two researchers under different prompt strategies.
  • Figure 4: Model Size vs. Relative Performance Gain (Improvement) (FW vs. NoFW) in Detection Accuracy and Reasoning Quality
  • Figure 5: Pareto Frontier Comparison of Security Detection Accuracy and Human-Evaluated Reasoning Dimensions (Evidence, Faithfulness, Structure, Taxonomy) Across Large Language Models with and without the Structured Prompt Framework