Table of Contents
Fetching ...

Planning Agents on an Ego-Trip: Leveraging Hybrid Ego-Graph Ensembles for Improved Tool Retrieval in Enterprise Task Planning

Sahil Bansal, Sai Shruthi Sistla, Aarti Arikatala, Sebastian Schreiber

TL;DR

This paper tackles the critical problem of tool pre-selection in enterprise task planning by introducing a knowledge-graph based retrieval framework that captures semantic tool relationships and dependencies. It proposes an Ensemble of Ego Graphs (EEG) retrieval mechanism that expands from semantically identified entry points through one-hop neighborhoods and re-ranks candidates with a specialized model, achieving state-of-the-art CompleteRecall on synthetic multi-step queries. The approach comprises offline KG construction (ontology-guided triple extraction and canonicalization) and online retrieval (entry-point matching, ego-graph expansion, and re-ranking), coupled with a synthetic data generation pipeline (Path Identification and R-P reasoning) to evaluate multi-step query handling. Empirical results show robust gains over semantic, lexical, and hybrid baselines, particularly for complex conditional and implicit multi-step queries, demonstrating practical value for enterprise environments with large, evolving tool ecosystems. The work also discusses limitations and future directions, including dataset openness, triple validation, and latency considerations to further improve real-world applicability.

Abstract

Effective tool pre-selection via retrieval is essential for AI agents to select from a vast array of tools when identifying and planning actions in the context of complex user queries. Despite its central role in planning, this aspect remains underexplored in the literature. Traditional approaches rely primarily on similarities between user queries and tool descriptions, which significantly limits retrieval accuracy, specifically when handling multi-step user requests. To address these limitations, we propose a Knowledge Graph (KG)-based tool retrieval framework that captures the semantic relationships between tools and their functional dependencies. Our retrieval algorithm leverages ensembles of 1-hop ego tool graphs to model direct and indirect connections between tools, enabling more comprehensive and contextual tool selection for multi-step tasks. We evaluate our approach on a synthetically generated internal dataset across six defined user classes, extending previous work on coherent dialogue synthesis and tool retrieval benchmarks. Results demonstrate that our tool graph-based method achieves 91.85% tool coverage on the micro-average CompleteRecall metric, compared to 89.26% for re-ranked semantic-lexical hybrid retrieval, the strongest non-KG baseline in our experiments. These findings support our hypothesis that the structural information modeled in the graph provides complementary signals to pure similarity matching, particularly for queries requiring sequential tool composition.

Planning Agents on an Ego-Trip: Leveraging Hybrid Ego-Graph Ensembles for Improved Tool Retrieval in Enterprise Task Planning

TL;DR

This paper tackles the critical problem of tool pre-selection in enterprise task planning by introducing a knowledge-graph based retrieval framework that captures semantic tool relationships and dependencies. It proposes an Ensemble of Ego Graphs (EEG) retrieval mechanism that expands from semantically identified entry points through one-hop neighborhoods and re-ranks candidates with a specialized model, achieving state-of-the-art CompleteRecall on synthetic multi-step queries. The approach comprises offline KG construction (ontology-guided triple extraction and canonicalization) and online retrieval (entry-point matching, ego-graph expansion, and re-ranking), coupled with a synthetic data generation pipeline (Path Identification and R-P reasoning) to evaluate multi-step query handling. Empirical results show robust gains over semantic, lexical, and hybrid baselines, particularly for complex conditional and implicit multi-step queries, demonstrating practical value for enterprise environments with large, evolving tool ecosystems. The work also discusses limitations and future directions, including dataset openness, triple validation, and latency considerations to further improve real-world applicability.

Abstract

Effective tool pre-selection via retrieval is essential for AI agents to select from a vast array of tools when identifying and planning actions in the context of complex user queries. Despite its central role in planning, this aspect remains underexplored in the literature. Traditional approaches rely primarily on similarities between user queries and tool descriptions, which significantly limits retrieval accuracy, specifically when handling multi-step user requests. To address these limitations, we propose a Knowledge Graph (KG)-based tool retrieval framework that captures the semantic relationships between tools and their functional dependencies. Our retrieval algorithm leverages ensembles of 1-hop ego tool graphs to model direct and indirect connections between tools, enabling more comprehensive and contextual tool selection for multi-step tasks. We evaluate our approach on a synthetically generated internal dataset across six defined user classes, extending previous work on coherent dialogue synthesis and tool retrieval benchmarks. Results demonstrate that our tool graph-based method achieves 91.85% tool coverage on the micro-average CompleteRecall metric, compared to 89.26% for re-ranked semantic-lexical hybrid retrieval, the strongest non-KG baseline in our experiments. These findings support our hypothesis that the structural information modeled in the graph provides complementary signals to pure similarity matching, particularly for queries requiring sequential tool composition.

Paper Structure

This paper contains 30 sections, 1 equation, 9 figures, 4 tables, 2 algorithms.

Figures (9)

  • Figure 1: Tool Construction and Retrieval Pipeline
  • Figure 2: Query Distribution across identified query classes
  • Figure 3: CompleteRecall@k micro-averages for each retrieval method
  • Figure 4: CompleteRecall@10 for each query type
  • Figure 5: Subgraph to help illustrate the example in Appendix \ref{['sec:complete_example']}
  • ...and 4 more figures