A Needle in a Haystack: Intent-driven Reusable Artifacts Recommendation with LLMs
Dongming Jin, Zhi Jin, Xiaohong Chen, Zheng Fang, Linyu Li, Yuanpeng He, Jia Li, Yirang Zhang, Yingtao Fang
TL;DR
This work tackles the challenge of intent-driven reusable artifact recommendation in large open-source ecosystems. It introduces IntentRecBench, a cross-ecosystem benchmark spanning JavaScript packages, HuggingFace models, and Linux groups, to evaluate non-LLM and LM-based approaches for artifact recommendations. While LLMs achieve higher precision than traditional IR methods, they face high inference costs due to the large candidate space. To address this, the authors propose TreeRec, a tree-guided framework that constructs a hierarchical semantic tree to narrow the search space and then re-ranks candidates with LLMs, yielding substantial gains in accuracy (up to ~65% in some metrics) and orders-of-magnitude improvements in latency across ecosystems and models. The approach demonstrates strong robustness and generalizability, offering a practical pathway toward scalable, intent-aware software reuse.
Abstract
In open source software development, the reuse of existing artifacts has been widely adopted to avoid redundant implementation work. Reusable artifacts are considered more efficient and reliable than developing software components from scratch. However, when faced with a large number of reusable artifacts, developers often struggle to find artifacts that can meet their expected needs. To reduce this burden, retrieval-based and learning-based techniques have been proposed to automate artifact recommendations. Recently, Large Language Models (LLMs) have shown the potential to understand intentions, perform semantic alignment, and recommend usable artifacts. Nevertheless, their effectiveness has not been thoroughly explored. To fill this gap, we construct an intent-driven artifact recommendation benchmark named IntentRecBench, covering three representative open source ecosystems. Using IntentRecBench, we conduct a comprehensive comparative study of five popular LLMs and six traditional approaches in terms of precision and efficiency. Our results show that although LLMs outperform traditional methods, they still suffer from low precision and high inference cost due to the large candidate space. Inspired by the ontology-based semantic organization in software engineering, we propose TreeRec, a feature tree-guided recommendation framework to mitigate these issues. TreeRec leverages LLM-based semantic abstraction to organize artifacts into a hierarchical semantic tree, enabling intent and function alignment and reducing reasoning time. Extensive experiments demonstrate that TreeRec consistently improves the performance of diverse LLMs across ecosystems, highlighting its generalizability and potential for practical deployment.
