Language hooks: a modular framework for augmenting LLM reasoning that decouples tool usage from the model and its prompt
Damien de Mijolla, Wen Yang, Philippa Duckett, Christopher Frye, Mark Worrall
TL;DR
Language hooks present a modular, task- and model-agnostic framework that interleaves base-model text generation with conditional program execution (hooks) to augment reasoning and tool usage. Hooks are defined as small, composable programs with triggers and eligibility checks, enabling capabilities like arithmetic validation, knowledge retrieval, and output guardrails without fine-tuning the base model. Empirical results across mathematical reasoning, multi-hop QA, and composite tasks show competitive performance with both general prompting baselines (CoT, ReAct) and task-aware methods (PAL, DSP), while preserving generalisability and enabling external validation. The approach offers a flexible, transparent pathway to extend LLM capabilities with reduced coupling to prompts and models, with potential applications in safety, verifiability, and modular tool integration.
Abstract
Prompting and fine-tuning have emerged as two competing paradigms for augmenting language models with new capabilities, such as the use of tools. Prompting approaches are quick to set up but rely on providing explicit demonstrations of each tool's usage in the model's prompt, thus coupling tool use to the task at hand and limiting generalisation. Fine-tuning removes the need for task-specific demonstrations of tool usage at runtime; however, this ties new capabilities to a single model, thus making already-heavier setup costs a recurring expense. In this paper, we introduce language hooks, a novel framework for augmenting language models with new capabilities that is decoupled both from the model's task-specific prompt and from the model itself. The language hook algorithm interleaves text generation by the base model with the execution of modular programs that trigger conditionally based on the existing text and the available capabilities. Upon triggering, programs may call external tools, auxiliary language models (e.g. using tool specific prompts), and modify the existing context. We benchmark our method against state-of-the-art baselines, find that it outperforms task-aware approaches, and demonstrate its ability to generalise to novel tasks.
