HuggingR$^{4}$: A Progressive Reasoning Framework for Discovering Optimal Model Companions
Shaoyin Ma, Jie Song, Huiqiong Wang, Li Sun, Mingli Song
TL;DR
The paper addresses the challenge of selecting optimal community-driven models from large, evolving hubs with incomplete metadata and prompt-bloat. It introduces HuggingR^4, a progressive reasoning framework comprising Reasoning, Retrieval, Refinement, and Reflection, augmented by vector-based retrieval, a failure-trace mechanism, and a sliding-window strategy to limit token use. A first forward-labeled dataset with 14,399 requests across 37 tasks supports extensive evaluation, where HuggingR^4 achieves substantial gains in workability and reasonability over baselines and shows token usage stability against growing candidate pools. The approach enables scalable, online adaptation to changing model ecosystems like HuggingFace and is applicable across multimodal tasks, offering practical improvements for building AI agents with diverse external interfaces.
Abstract
Large Language Models (LLMs) have made remarkable progress in their ability to interact with external interfaces. Selecting reasonable external interfaces has thus become a crucial step in constructing LLM agents. In contrast to invoking API tools, directly calling AI models across different modalities from the community (e.g., HuggingFace) poses challenges due to the vast scale (> 10k), metadata gaps, and unstructured descriptions. Current methods for model selection often involve incorporating entire model descriptions into prompts, resulting in prompt bloat, wastage of tokens and limited scalability. To address these issues, we propose HuggingR$^4$, a novel framework that combines Reasoning, Retrieval, Refinement, and Reflection, to efficiently select models. Specifically, We first perform multiple rounds of reasoning and retrieval to get a coarse list of candidate models. Then, we conduct fine-grained refinement by analyzing candidate model descriptions, followed by reflection to assess results and determine if retrieval scope expansion is necessary. This method reduces token consumption considerably by decoupling user query processing from complex model description handling. Through a pre-established vector database, complex model descriptions are stored externally and retrieved on-demand, allowing the LLM to concentrate on interpreting user intent while accessing only relevant candidate models without prompt bloat. In the absence of standardized benchmarks, we construct a multimodal human-annotated dataset comprising 14,399 user requests across 37 tasks and conduct a thorough evaluation. HuggingR$^4$ attains a workability rate of 92.03% and a reasonability rate of 82.46%, surpassing existing method by 26.51% and 33.25% respectively on GPT-4o-mini.
