Selective Shot Learning for Code Explanation
Paheli Bhattacharya, Rishabh Gupta
TL;DR
This work tackles code explanation via few-shot prompting for open-source Code-LLMs and identifies selective-shot learning (SSL) as a key lever. It introduces SSL_ner, a token/embedding-free approach that uses code entity information to select demonstrations, and benchmarks open-source Code-LLMs on two datasets (CoNaLa inline Python and TLC function-level Java). Empirically, SSL_ner often yields the best token-based demonstrations and reveals that medium-sized LLMs benefit more from few-shot prompting, while CodeLlama 34B excels in zero-shot settings. The study provides a principled, interpretable, and extensible framework for few-shot prompt design and establishes a first systematic benchmark of open-source Code-LLMs for code explanation.
Abstract
Code explanation plays a crucial role in the software engineering domain, aiding developers in grasping code functionality efficiently. Recent work shows that the performance of LLMs for code explanation improves in a few-shot setting, especially when the few-shot examples are selected intelligently. State-of-the-art approaches for such Selective Shot Learning (SSL) include token-based and embedding-based methods. However, these SSL approaches have been evaluated on proprietary LLMs, without much exploration on open-source Code-LLMs. Additionally, these methods lack consideration for programming language syntax. To bridge these gaps, we present a comparative study and propose a novel SSL method (SSL_ner) that utilizes entity information for few-shot example selection. We present several insights and show the effectiveness of SSL_ner approach over state-of-the-art methods across two datasets. To the best of our knowledge, this is the first systematic benchmarking of open-source Code-LLMs while assessing the performances of the various few-shot examples selection approaches for the code explanation task.
