Table of Contents
Fetching ...

J&H: Evaluating the Robustness of Large Language Models Under Knowledge-Injection Attacks in Legal Domain

Yiran Hu, Huanghai Liu, Qingjing Chen, Ning Zheng, Chong Wang, Yun Liu, Charles L. A. Clarke, Weixing Shen

TL;DR

This work addresses the reliability of large language models in knowledge-intensive tasks, focusing on the legal domain. It introduces J&H, a framework grounded in syllogistic deductive reasoning (major premise, minor premise, conclusion) to probe knowledge-injection attacks and assess whether LLMs rely on genuine domain knowledge and logic. Experiments on LEVEN and CAIL2018 with general and domain-specific LLMs reveal widespread fragility, with conclusion-level attacks most disruptive and legal-knowledge perturbations yielding stronger effects; modest improvements from RAG, chain-of-thought prompts, and few-shot cases do not fully solve the issue. The authors argue that robust performance in law requires domain knowledge and reasoning to be integrated during pre-training or fine-tuning, not merely addressed via prompting, and they offer J&H as a reusable framework for broader knowledge-intensive domains.

Abstract

As the scale and capabilities of Large Language Models (LLMs) increase, their applications in knowledge-intensive fields such as legal domain have garnered widespread attention. However, it remains doubtful whether these LLMs make judgments based on domain knowledge for reasoning. If LLMs base their judgments solely on specific words or patterns, rather than on the underlying logic of the language, the ''LLM-as-judges'' paradigm poses substantial risks in the real-world applications. To address this question, we propose a method of legal knowledge injection attacks for robustness testing, thereby inferring whether LLMs have learned legal knowledge and reasoning logic. In this paper, we propose J&H: an evaluation framework for detecting the robustness of LLMs under knowledge injection attacks in the legal domain. The aim of the framework is to explore whether LLMs perform deductive reasoning when accomplishing legal tasks. To further this aim, we have attacked each part of the reasoning logic underlying these tasks (major premise, minor premise, and conclusion generation). We have collected mistakes that legal experts might make in judicial decisions in the real world, such as typos, legal synonyms, inaccurate external legal statutes retrieval. However, in real legal practice, legal experts tend to overlook these mistakes and make judgments based on logic. However, when faced with these errors, LLMs are likely to be misled by typographical errors and may not utilize logic in their judgments. We conducted knowledge injection attacks on existing general and domain-specific LLMs. Current LLMs are not robust against the attacks employed in our experiments. In addition we propose and compare several methods to enhance the knowledge robustness of LLMs.

J&H: Evaluating the Robustness of Large Language Models Under Knowledge-Injection Attacks in Legal Domain

TL;DR

This work addresses the reliability of large language models in knowledge-intensive tasks, focusing on the legal domain. It introduces J&H, a framework grounded in syllogistic deductive reasoning (major premise, minor premise, conclusion) to probe knowledge-injection attacks and assess whether LLMs rely on genuine domain knowledge and logic. Experiments on LEVEN and CAIL2018 with general and domain-specific LLMs reveal widespread fragility, with conclusion-level attacks most disruptive and legal-knowledge perturbations yielding stronger effects; modest improvements from RAG, chain-of-thought prompts, and few-shot cases do not fully solve the issue. The authors argue that robust performance in law requires domain knowledge and reasoning to be integrated during pre-training or fine-tuning, not merely addressed via prompting, and they offer J&H as a reusable framework for broader knowledge-intensive domains.

Abstract

As the scale and capabilities of Large Language Models (LLMs) increase, their applications in knowledge-intensive fields such as legal domain have garnered widespread attention. However, it remains doubtful whether these LLMs make judgments based on domain knowledge for reasoning. If LLMs base their judgments solely on specific words or patterns, rather than on the underlying logic of the language, the ''LLM-as-judges'' paradigm poses substantial risks in the real-world applications. To address this question, we propose a method of legal knowledge injection attacks for robustness testing, thereby inferring whether LLMs have learned legal knowledge and reasoning logic. In this paper, we propose J&H: an evaluation framework for detecting the robustness of LLMs under knowledge injection attacks in the legal domain. The aim of the framework is to explore whether LLMs perform deductive reasoning when accomplishing legal tasks. To further this aim, we have attacked each part of the reasoning logic underlying these tasks (major premise, minor premise, and conclusion generation). We have collected mistakes that legal experts might make in judicial decisions in the real world, such as typos, legal synonyms, inaccurate external legal statutes retrieval. However, in real legal practice, legal experts tend to overlook these mistakes and make judgments based on logic. However, when faced with these errors, LLMs are likely to be misled by typographical errors and may not utilize logic in their judgments. We conducted knowledge injection attacks on existing general and domain-specific LLMs. Current LLMs are not robust against the attacks employed in our experiments. In addition we propose and compare several methods to enhance the knowledge robustness of LLMs.

Paper Structure

This paper contains 18 sections, 5 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: The Framework of J&H.
  • Figure 2: Illustration of Major Premise Attack.
  • Figure 3: Illustration of Minor Premise Attack.
  • Figure 4: Illustration of Conclusion Attack.
  • Figure 5: Location Attack on the LEVEN dataset.