HGMF: A Hierarchical Gaussian Mixture Framework for Scalable Tool Invocation within the Model Context Protocol
Wenpeng Xing, Zhipeng Chen, Changting Lin, Meng Han
TL;DR
This work tackles the challenge of scalable tool invocation for LLMs when faced with large, hierarchical tool libraries. It introduces the Hierarchical Gaussian Mixture Framework (HGMF), which unifies queries, servers, and tools into a shared semantic space and performs two-stage GMM-based pruning (server then tool) to yield a compact, high-relevance candidate set. Regularized, probabilistic clustering improves robustness in sparse data, and an LLM-based reranking stage finalizes the selection via generated descriptions and a cosine-based scoring scheme. Experiments on the MCP-tools dataset show state-of-the-art accuracy and reduced inference latency across eight LLMs, with notable gains in high-shot and low-shot regimes, highlighting the method’s scalability and practical impact for large-scale tool libraries.
Abstract
Invoking external tools enables Large Language Models (LLMs) to perform complex, real-world tasks, yet selecting the correct tool from large, hierarchically-structured libraries remains a significant challenge. The limited context windows of LLMs and noise from irrelevant options often lead to low selection accuracy and high computational costs. To address this, we propose the Hierarchical Gaussian Mixture Framework (HGMF), a probabilistic pruning method for scalable tool invocation. HGMF first maps the user query and all tool descriptions into a unified semantic space. The framework then operates in two stages: it clusters servers using a Gaussian Mixture Model (GMM) and filters them based on the query's likelihood. Subsequently, it applies the same GMM-based clustering and filtering to the tools associated with the selected servers. This hierarchical process produces a compact, high-relevance candidate set, simplifying the final selection task for the LLM. Experiments on a public dataset show that HGMF significantly improves tool selection accuracy while reducing inference latency, confirming the framework's scalability and effectiveness for large-scale tool libraries.
