Table of Contents
Fetching ...

Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs

Tengyun Ma, Jiaqi Yao, Daojing He, Shihao Peng, Yu Li, Shaohui Liu, Zhuotao Tian

TL;DR

This work identifies Tool-Completion Attack (TCA), a new prompt-injection threat targeting tool-augmented LLMs, and introduces the Tool-Completion Benchmark to quantify vulnerabilities. It proposes Context-Aware Hierarchical Learning (CAHL), a two-stage approach that builds a context-aware instruction hierarchy via Segment Summarization and Contextual Propagation, formalized as Segment Query Embedding and an integration scheme. Empirical results show CAHL substantially improves robustness against TCA and conventional attacks while preserving standard task performance, with strong zero-shot generalization in multi-turn tool scenarios. The paper highlights the importance of hierarchical, context-sensitive instruction processing for safer LLM deployment and provides open-source code to facilitate adoption and further research.

Abstract

Large Language Models (LLMs) have emerged as powerful tools for diverse applications. However, their uniform token processing paradigm introduces critical vulnerabilities in instruction handling, particularly when exposed to adversarial scenarios. In this work, we identify and propose a novel class of vulnerabilities, termed Tool-Completion Attack (TCA), which exploits function-calling mechanisms to subvert model behavior. To evaluate LLM robustness against such threats, we introduce the Tool-Completion benchmark, a comprehensive security assessment framework, which reveals that even state-of-the-art models remain susceptible to TCA, with surprisingly high attack success rates. To address these vulnerabilities, we introduce Context-Aware Hierarchical Learning (CAHL), a sophisticated mechanism that dynamically balances semantic comprehension with role-specific instruction constraints. CAHL leverages the contextual correlations between different instruction segments to establish a robust, context-aware instruction hierarchy. Extensive experiments demonstrate that CAHL significantly enhances LLM robustness against both conventional attacks and the proposed TCA, exhibiting strong generalization capabilities in zero-shot evaluations while still preserving model performance on generic tasks. Our code is available at https://github.com/S2AILab/CAHL.

Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs

TL;DR

This work identifies Tool-Completion Attack (TCA), a new prompt-injection threat targeting tool-augmented LLMs, and introduces the Tool-Completion Benchmark to quantify vulnerabilities. It proposes Context-Aware Hierarchical Learning (CAHL), a two-stage approach that builds a context-aware instruction hierarchy via Segment Summarization and Contextual Propagation, formalized as Segment Query Embedding and an integration scheme. Empirical results show CAHL substantially improves robustness against TCA and conventional attacks while preserving standard task performance, with strong zero-shot generalization in multi-turn tool scenarios. The paper highlights the importance of hierarchical, context-sensitive instruction processing for safer LLM deployment and provides open-source code to facilitate adoption and further research.

Abstract

Large Language Models (LLMs) have emerged as powerful tools for diverse applications. However, their uniform token processing paradigm introduces critical vulnerabilities in instruction handling, particularly when exposed to adversarial scenarios. In this work, we identify and propose a novel class of vulnerabilities, termed Tool-Completion Attack (TCA), which exploits function-calling mechanisms to subvert model behavior. To evaluate LLM robustness against such threats, we introduce the Tool-Completion benchmark, a comprehensive security assessment framework, which reveals that even state-of-the-art models remain susceptible to TCA, with surprisingly high attack success rates. To address these vulnerabilities, we introduce Context-Aware Hierarchical Learning (CAHL), a sophisticated mechanism that dynamically balances semantic comprehension with role-specific instruction constraints. CAHL leverages the contextual correlations between different instruction segments to establish a robust, context-aware instruction hierarchy. Extensive experiments demonstrate that CAHL significantly enhances LLM robustness against both conventional attacks and the proposed TCA, exhibiting strong generalization capabilities in zero-shot evaluations while still preserving model performance on generic tasks. Our code is available at https://github.com/S2AILab/CAHL.

Paper Structure

This paper contains 26 sections, 4 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: A brief example of TCA. It first synthesizes a task completion according to the dialogue, affirming user satisfaction. Then it postulates a scene-grounded object with semantic connection to both the context and the injected instruction which is appended at the end.
  • Figure 2: Distributions of risk scores given by Prompt Guard. (a) The majority of TCA data points exhibit a pronounced low-risk distribution, whereas the most instances for both attack types (Hijacking and Extraction) in Tensor Trust are densely concentrated at risk scores exceeding 0.9. (b) The detailed distribution of TCA scores reveals a lower sample density in the high-risk score range.
  • Figure 3: Overview of Context-Aware Hierarchical Learning (CAHL).
  • Figure 4: An illustrative instance of an indirect prompt injection attack sourced from the StruQ benchmark and the outputs generated by the StruQ baseline, ISE, and CAHL.
  • Figure 5: Attention patterns of StruQ baseline, ISE, and CAHL based on the example of Figure \ref{['case']}.
  • ...and 1 more figures