Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs
Tengyun Ma, Jiaqi Yao, Daojing He, Shihao Peng, Yu Li, Shaohui Liu, Zhuotao Tian
TL;DR
This work identifies Tool-Completion Attack (TCA), a new prompt-injection threat targeting tool-augmented LLMs, and introduces the Tool-Completion Benchmark to quantify vulnerabilities. It proposes Context-Aware Hierarchical Learning (CAHL), a two-stage approach that builds a context-aware instruction hierarchy via Segment Summarization and Contextual Propagation, formalized as Segment Query Embedding and an integration scheme. Empirical results show CAHL substantially improves robustness against TCA and conventional attacks while preserving standard task performance, with strong zero-shot generalization in multi-turn tool scenarios. The paper highlights the importance of hierarchical, context-sensitive instruction processing for safer LLM deployment and provides open-source code to facilitate adoption and further research.
Abstract
Large Language Models (LLMs) have emerged as powerful tools for diverse applications. However, their uniform token processing paradigm introduces critical vulnerabilities in instruction handling, particularly when exposed to adversarial scenarios. In this work, we identify and propose a novel class of vulnerabilities, termed Tool-Completion Attack (TCA), which exploits function-calling mechanisms to subvert model behavior. To evaluate LLM robustness against such threats, we introduce the Tool-Completion benchmark, a comprehensive security assessment framework, which reveals that even state-of-the-art models remain susceptible to TCA, with surprisingly high attack success rates. To address these vulnerabilities, we introduce Context-Aware Hierarchical Learning (CAHL), a sophisticated mechanism that dynamically balances semantic comprehension with role-specific instruction constraints. CAHL leverages the contextual correlations between different instruction segments to establish a robust, context-aware instruction hierarchy. Extensive experiments demonstrate that CAHL significantly enhances LLM robustness against both conventional attacks and the proposed TCA, exhibiting strong generalization capabilities in zero-shot evaluations while still preserving model performance on generic tasks. Our code is available at https://github.com/S2AILab/CAHL.
