Table of Contents
Fetching ...

HGMF: A Hierarchical Gaussian Mixture Framework for Scalable Tool Invocation within the Model Context Protocol

Wenpeng Xing, Zhipeng Chen, Changting Lin, Meng Han

TL;DR

This work tackles the challenge of scalable tool invocation for LLMs when faced with large, hierarchical tool libraries. It introduces the Hierarchical Gaussian Mixture Framework (HGMF), which unifies queries, servers, and tools into a shared semantic space and performs two-stage GMM-based pruning (server then tool) to yield a compact, high-relevance candidate set. Regularized, probabilistic clustering improves robustness in sparse data, and an LLM-based reranking stage finalizes the selection via generated descriptions and a cosine-based scoring scheme. Experiments on the MCP-tools dataset show state-of-the-art accuracy and reduced inference latency across eight LLMs, with notable gains in high-shot and low-shot regimes, highlighting the method’s scalability and practical impact for large-scale tool libraries.

Abstract

Invoking external tools enables Large Language Models (LLMs) to perform complex, real-world tasks, yet selecting the correct tool from large, hierarchically-structured libraries remains a significant challenge. The limited context windows of LLMs and noise from irrelevant options often lead to low selection accuracy and high computational costs. To address this, we propose the Hierarchical Gaussian Mixture Framework (HGMF), a probabilistic pruning method for scalable tool invocation. HGMF first maps the user query and all tool descriptions into a unified semantic space. The framework then operates in two stages: it clusters servers using a Gaussian Mixture Model (GMM) and filters them based on the query's likelihood. Subsequently, it applies the same GMM-based clustering and filtering to the tools associated with the selected servers. This hierarchical process produces a compact, high-relevance candidate set, simplifying the final selection task for the LLM. Experiments on a public dataset show that HGMF significantly improves tool selection accuracy while reducing inference latency, confirming the framework's scalability and effectiveness for large-scale tool libraries.

HGMF: A Hierarchical Gaussian Mixture Framework for Scalable Tool Invocation within the Model Context Protocol

TL;DR

This work tackles the challenge of scalable tool invocation for LLMs when faced with large, hierarchical tool libraries. It introduces the Hierarchical Gaussian Mixture Framework (HGMF), which unifies queries, servers, and tools into a shared semantic space and performs two-stage GMM-based pruning (server then tool) to yield a compact, high-relevance candidate set. Regularized, probabilistic clustering improves robustness in sparse data, and an LLM-based reranking stage finalizes the selection via generated descriptions and a cosine-based scoring scheme. Experiments on the MCP-tools dataset show state-of-the-art accuracy and reduced inference latency across eight LLMs, with notable gains in high-shot and low-shot regimes, highlighting the method’s scalability and practical impact for large-scale tool libraries.

Abstract

Invoking external tools enables Large Language Models (LLMs) to perform complex, real-world tasks, yet selecting the correct tool from large, hierarchically-structured libraries remains a significant challenge. The limited context windows of LLMs and noise from irrelevant options often lead to low selection accuracy and high computational costs. To address this, we propose the Hierarchical Gaussian Mixture Framework (HGMF), a probabilistic pruning method for scalable tool invocation. HGMF first maps the user query and all tool descriptions into a unified semantic space. The framework then operates in two stages: it clusters servers using a Gaussian Mixture Model (GMM) and filters them based on the query's likelihood. Subsequently, it applies the same GMM-based clustering and filtering to the tools associated with the selected servers. This hierarchical process produces a compact, high-relevance candidate set, simplifying the final selection task for the LLM. Experiments on a public dataset show that HGMF significantly improves tool selection accuracy while reducing inference latency, confirming the framework's scalability and effectiveness for large-scale tool libraries.

Paper Structure

This paper contains 20 sections, 7 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Overall Pipeline. The process involves embedding the user query, servers, and tools into a unified semantic space. HGMF then performs a two-level pruning strategy using clustering at the server level to obtain a relevant server list, followed by clustering at the tool level within the selected servers to yield a refined list of candidate toolsets.
  • Figure 2: Performance comparison of HGMF against five baselines across eight LLMs. Each subplot shows accuracy as a function of sample size (log scale). HGMF achieves state-of-the-art results in most scenarios, showing a clear advantage at larger sample sizes.
  • Figure 3: Accuracy comparison of HGMF with and without regularization. The regularized model achieves significant gains (14-28%) in low-shot scenarios by mitigating cluster instability, leading to more stable and accurate performance across all sample sizes.
  • Figure 4: Accuracy gain of the HGMF method over the MCP-zero baseline. The results show that HGMF's performance advantage steadily increases with the sample size, demonstrating its superior scalability.