Table of Contents
Fetching ...

Addressing Popularity Bias in Third-Party Library Recommendations Using LLMs

Claudio Di Sipio, Juri Di Rocco, Davide Di Ruscio, Vladyslav Bulhakov

TL;DR

This paper tackles popularity bias in third-party library RSSEs by evaluating open-source LLMs (Llama variants) for mitigating long-tail effects. Through a six-configuration ablation study combining prompt engineering, fine-tuning, and a popularity penalty, the authors find that baseline LLMs struggle to fully counter the bias, though advanced methods improve diversity and catalog coverage. Recall remains a bottleneck, signaling that further techniques—such as retrieval-augmented generation and human-in-the-loop approaches—may be needed. The work lays groundwork for future exploration and provides a replication package to enable broader testing in RSSE contexts. Overall, it highlights both the potential and current limitations of LLMs in producing more balanced, task-relevant TPL recommendations.

Abstract

Recommender systems for software engineering (RSSE) play a crucial role in automating development tasks by providing relevant suggestions according to the developer's context. However, they suffer from the so-called popularity bias, i.e., the phenomenon of recommending popular items that might be irrelevant to the current task. In particular, the long-tail effect can hamper the system's performance in terms of accuracy, thus leading to false positives in the provided recommendations. Foundation models are the most advanced generative AI-based models that achieve relevant results in several SE tasks. This paper aims to investigate the capability of large language models (LLMs) to address the popularity bias in recommender systems of third-party libraries (TPLs). We conduct an ablation study experimenting with state-of-the-art techniques to mitigate the popularity bias, including fine-tuning and popularity penalty mechanisms. Our findings reveal that the considered LLMs cannot address the popularity bias in TPL recommenders, even though fine-tuning and post-processing penalty mechanism contributes to increasing the overall diversity of the provided recommendations. In addition, we discuss the limitations of LLMs in this context and suggest potential improvements to address the popularity bias in TPL recommenders, thus paving the way for additional experiments in this direction.

Addressing Popularity Bias in Third-Party Library Recommendations Using LLMs

TL;DR

This paper tackles popularity bias in third-party library RSSEs by evaluating open-source LLMs (Llama variants) for mitigating long-tail effects. Through a six-configuration ablation study combining prompt engineering, fine-tuning, and a popularity penalty, the authors find that baseline LLMs struggle to fully counter the bias, though advanced methods improve diversity and catalog coverage. Recall remains a bottleneck, signaling that further techniques—such as retrieval-augmented generation and human-in-the-loop approaches—may be needed. The work lays groundwork for future exploration and provides a replication package to enable broader testing in RSSE contexts. Overall, it highlights both the potential and current limitations of LLMs in producing more balanced, task-relevant TPL recommendations.

Abstract

Recommender systems for software engineering (RSSE) play a crucial role in automating development tasks by providing relevant suggestions according to the developer's context. However, they suffer from the so-called popularity bias, i.e., the phenomenon of recommending popular items that might be irrelevant to the current task. In particular, the long-tail effect can hamper the system's performance in terms of accuracy, thus leading to false positives in the provided recommendations. Foundation models are the most advanced generative AI-based models that achieve relevant results in several SE tasks. This paper aims to investigate the capability of large language models (LLMs) to address the popularity bias in recommender systems of third-party libraries (TPLs). We conduct an ablation study experimenting with state-of-the-art techniques to mitigate the popularity bias, including fine-tuning and popularity penalty mechanisms. Our findings reveal that the considered LLMs cannot address the popularity bias in TPL recommenders, even though fine-tuning and post-processing penalty mechanism contributes to increasing the overall diversity of the provided recommendations. In addition, we discuss the limitations of LLMs in this context and suggest potential improvements to address the popularity bias in TPL recommenders, thus paving the way for additional experiments in this direction.
Paper Structure (17 sections, 6 equations, 3 figures, 3 tables)

This paper contains 17 sections, 6 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Popularity bias in traditional TPL RSSEs
  • Figure 2: Overview of the proposed approach
  • Figure 3: Most popular TPLs in the dataset.