Table of Contents
Fetching ...

A Needle in a Haystack: Intent-driven Reusable Artifacts Recommendation with LLMs

Dongming Jin, Zhi Jin, Xiaohong Chen, Zheng Fang, Linyu Li, Yuanpeng He, Jia Li, Yirang Zhang, Yingtao Fang

TL;DR

This work tackles the challenge of intent-driven reusable artifact recommendation in large open-source ecosystems. It introduces IntentRecBench, a cross-ecosystem benchmark spanning JavaScript packages, HuggingFace models, and Linux groups, to evaluate non-LLM and LM-based approaches for artifact recommendations. While LLMs achieve higher precision than traditional IR methods, they face high inference costs due to the large candidate space. To address this, the authors propose TreeRec, a tree-guided framework that constructs a hierarchical semantic tree to narrow the search space and then re-ranks candidates with LLMs, yielding substantial gains in accuracy (up to ~65% in some metrics) and orders-of-magnitude improvements in latency across ecosystems and models. The approach demonstrates strong robustness and generalizability, offering a practical pathway toward scalable, intent-aware software reuse.

Abstract

In open source software development, the reuse of existing artifacts has been widely adopted to avoid redundant implementation work. Reusable artifacts are considered more efficient and reliable than developing software components from scratch. However, when faced with a large number of reusable artifacts, developers often struggle to find artifacts that can meet their expected needs. To reduce this burden, retrieval-based and learning-based techniques have been proposed to automate artifact recommendations. Recently, Large Language Models (LLMs) have shown the potential to understand intentions, perform semantic alignment, and recommend usable artifacts. Nevertheless, their effectiveness has not been thoroughly explored. To fill this gap, we construct an intent-driven artifact recommendation benchmark named IntentRecBench, covering three representative open source ecosystems. Using IntentRecBench, we conduct a comprehensive comparative study of five popular LLMs and six traditional approaches in terms of precision and efficiency. Our results show that although LLMs outperform traditional methods, they still suffer from low precision and high inference cost due to the large candidate space. Inspired by the ontology-based semantic organization in software engineering, we propose TreeRec, a feature tree-guided recommendation framework to mitigate these issues. TreeRec leverages LLM-based semantic abstraction to organize artifacts into a hierarchical semantic tree, enabling intent and function alignment and reducing reasoning time. Extensive experiments demonstrate that TreeRec consistently improves the performance of diverse LLMs across ecosystems, highlighting its generalizability and potential for practical deployment.

A Needle in a Haystack: Intent-driven Reusable Artifacts Recommendation with LLMs

TL;DR

This work tackles the challenge of intent-driven reusable artifact recommendation in large open-source ecosystems. It introduces IntentRecBench, a cross-ecosystem benchmark spanning JavaScript packages, HuggingFace models, and Linux groups, to evaluate non-LLM and LM-based approaches for artifact recommendations. While LLMs achieve higher precision than traditional IR methods, they face high inference costs due to the large candidate space. To address this, the authors propose TreeRec, a tree-guided framework that constructs a hierarchical semantic tree to narrow the search space and then re-ranks candidates with LLMs, yielding substantial gains in accuracy (up to ~65% in some metrics) and orders-of-magnitude improvements in latency across ecosystems and models. The approach demonstrates strong robustness and generalizability, offering a practical pathway toward scalable, intent-aware software reuse.

Abstract

In open source software development, the reuse of existing artifacts has been widely adopted to avoid redundant implementation work. Reusable artifacts are considered more efficient and reliable than developing software components from scratch. However, when faced with a large number of reusable artifacts, developers often struggle to find artifacts that can meet their expected needs. To reduce this burden, retrieval-based and learning-based techniques have been proposed to automate artifact recommendations. Recently, Large Language Models (LLMs) have shown the potential to understand intentions, perform semantic alignment, and recommend usable artifacts. Nevertheless, their effectiveness has not been thoroughly explored. To fill this gap, we construct an intent-driven artifact recommendation benchmark named IntentRecBench, covering three representative open source ecosystems. Using IntentRecBench, we conduct a comprehensive comparative study of five popular LLMs and six traditional approaches in terms of precision and efficiency. Our results show that although LLMs outperform traditional methods, they still suffer from low precision and high inference cost due to the large candidate space. Inspired by the ontology-based semantic organization in software engineering, we propose TreeRec, a feature tree-guided recommendation framework to mitigate these issues. TreeRec leverages LLM-based semantic abstraction to organize artifacts into a hierarchical semantic tree, enabling intent and function alignment and reducing reasoning time. Extensive experiments demonstrate that TreeRec consistently improves the performance of diverse LLMs across ecosystems, highlighting its generalizability and potential for practical deployment.

Paper Structure

This paper contains 26 sections, 2 equations, 7 figures, 7 tables, 1 algorithm.

Figures (7)

  • Figure 1: The example of human workflow and alternative solution for intent-driven artifact recommendation across ecosystems. Developers express task-specific intents, such as improving web performance, extracting named entities, or enhancing international usability. These intents can be processed through different recommendation paradigms (retrieval-based, learning-based, and LLM-based) to identify corresponding software artifacts.
  • Figure 2: A simplified example of a feature tree in a smart home system. It illustrates how functionalities can be organized hierarchically in a feature model. High-level features such as Lighting Control, Temperature Regulation, and Security Monitoring are decomposed into representative sub-features.
  • Figure 3: Overview of IntentRecBench Construction Pipeline
  • Figure 4: The empirical results for all solutions from the precision perspective.
  • Figure 5: The empirical results for all solutions from the efficiency perspective.
  • ...and 2 more figures