Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Joongwon Kim; Bhargavi Paranjape; Tushar Khot; Hannaneh Hajishirzi

Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Joongwon Kim, Bhargavi Paranjape, Tushar Khot, Hannaneh Hajishirzi

TL;DR

Husky introduces an open-source, unified language agent capable of multi-step reasoning across numerical, tabular, and knowledge-based tasks by iteratively planning high-level actions and executing them with a suite of specialized tools. The framework learns a joint action generator and a set of expert models trained from tool-integrated trajectories synthesized by a teacher model, enabling efficient cross-domain deployment without reliance on proprietary models. Husky and HuskyQA achieve strong performance across 14 datasets, including mixed-tool tasks, often rivaling frontier models while using only 7B–13B bases, and demonstrate notable cross-task generalization. The work offers a scalable recipe for building open-language agents with wide applicability and establishes a rigorous benchmark for mixed-tool reasoning in HuskyQA.

Abstract

Language agents perform complex tasks by using tools to execute each step precisely. However, most existing agents are based on proprietary models or designed to target specific tasks, such as mathematics or multi-hop question answering. We introduce Husky, a holistic, open-source language agent that learns to reason over a unified action space to address a diverse set of complex tasks involving numerical, tabular, and knowledge-based reasoning. Husky iterates between two stages: 1) generating the next action to take towards solving a given task and 2) executing the action using expert models and updating the current solution state. We identify a thorough ontology of actions for addressing complex tasks and curate high-quality data to train expert models for executing these actions. Our experiments show that Husky outperforms prior language agents across 14 evaluation datasets. Moreover, we introduce HuskyQA, a new evaluation set which stress tests language agents for mixed-tool reasoning, with a focus on retrieving missing knowledge and performing numerical reasoning. Despite using 7B models, Husky matches or even exceeds frontier LMs such as GPT-4 on these tasks, showcasing the efficacy of our holistic approach in addressing complex reasoning problems. Our code and models are available at https://github.com/agent-husky/Husky-v1.

Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

TL;DR

Abstract

Paper Structure (59 sections, 42 figures, 10 tables, 1 algorithm)

This paper contains 59 sections, 42 figures, 10 tables, 1 algorithm.

Introduction
Related Work
Language agents.
Tool use.
$\textsc{Husky}$: A Modular Framework for Solving Multi-Step Reasoning Tasks
Problem Formulation and Overview
Inference overview.
Training overview.
$\textsc{Husky}$ Training
Synthesizing Tool-Integrated Solutions
Training $\textsc{Husky}$ modules
Action generator.
Expert models.
$\textsc{Husky}$ Inference
$\textsc{Husky}$ Evaluation
...and 44 more sections

Figures (42)

Figure 1: Schematic of $\textsc{Husky}$. $\textsc{Husky}$ iterates between action generation where it generates a tool call and the corresponding high-level step description, and action execution where it uses the tool-associated expert model to execute the action, repeating this until it arrives at the terminal state.
Figure 2: Overview of $\textsc{Husky}$. $\textsc{Husky}$ solves multi-step tasks for numerical, tabular and knowledge-based reasoning by jointly predicting the next high-level step and tool with an action generator, and executing the action with the assigned expert model. This process repeats until it arrives at the final answer. As shown above, $\textsc{Husky}$ employs multiple LMs in parallel to solve a complex task, with the action generator coordinating the expert models, similar to how several Huskies pull a sleigh together.
Figure 3: Training data synthesis for $\textsc{Husky}$. A teacher LM is few-shot prompted to generate an initial trajectory for a question given in a training task. Then, each solution is parsed to extract steps and their outputs, which are used to construct training sets for $\mathcal{A}$, $\mathcal{M}_m$, $\mathcal{M}_c$ and $\mathcal{M}_q$.
Figure 4: Visualization of a Google Search result (equally returned by SERP API) for the search query "when was george washington born". We use the information presented in the red box.
Figure 5: Instruction for numerical reasoning tasks
...and 37 more figures

Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

TL;DR

Abstract

Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Authors

TL;DR

Abstract

Table of Contents

Figures (42)