Table of Contents
Fetching ...

AutoFL: A Tool for Automatic Multi-granular Labelling of Software Repositories

Cezar Sas, Andrea Capiluppi

TL;DR

The paper addresses the challenge of making software repositories easy to understand by automatically labeling artifacts with domain-specific annotations. It introduces AutoFL, a tool that derives multi-granular labels directly from source code for file-, package-, and project-level labeling. The authors describe the tool's internal architecture, illustrate a sample analysis, and discuss limitations and avenues for future work. This approach has the potential to streamline code comprehension and improve developer productivity in large, multi-domain codebases.

Abstract

Software comprehension, especially of new code bases, is time consuming for developers, especially in large projects with multiple functionalities spanning various domains. One strategy to reduce this effort involves annotating files with meaningful labels that describe the functionalities contained. However, prior research has so far focused on classifying the whole project using README files as a proxy, resulting in little information gained for the developers. Our objective is to streamline the labelling of files with the correct application domains using source code as input. To achieve this, in prior work, we evaluated the ability to annotate files automatically using a weak labelling approach. This paper presents AutoFL, a tool for automatically labelling software repositories from source code. AutoFL allows multi-granular annotations including: \textit{file}, \textit{package}, and \textit{project} -level. We provide an overview of the tool's internals, present an example analysis for which AutoFL can be used, and discuss limitations and future work.

AutoFL: A Tool for Automatic Multi-granular Labelling of Software Repositories

TL;DR

The paper addresses the challenge of making software repositories easy to understand by automatically labeling artifacts with domain-specific annotations. It introduces AutoFL, a tool that derives multi-granular labels directly from source code for file-, package-, and project-level labeling. The authors describe the tool's internal architecture, illustrate a sample analysis, and discuss limitations and avenues for future work. This approach has the potential to streamline code comprehension and improve developer productivity in large, multi-domain codebases.

Abstract

Software comprehension, especially of new code bases, is time consuming for developers, especially in large projects with multiple functionalities spanning various domains. One strategy to reduce this effort involves annotating files with meaningful labels that describe the functionalities contained. However, prior research has so far focused on classifying the whole project using README files as a proxy, resulting in little information gained for the developers. Our objective is to streamline the labelling of files with the correct application domains using source code as input. To achieve this, in prior work, we evaluated the ability to annotate files automatically using a weak labelling approach. This paper presents AutoFL, a tool for automatically labelling software repositories from source code. AutoFL allows multi-granular annotations including: \textit{file}, \textit{package}, and \textit{project} -level. We provide an overview of the tool's internals, present an example analysis for which AutoFL can be used, and discuss limitations and future work.
Paper Structure (4 sections)

This paper contains 4 sections.