Table of Contents
Fetching ...

Boidae: Your Personal Mining Platform

Brian Sigurdson, Samuel W. Flint, Robert Dyer

TL;DR

The paper addresses the difficulty of conducting mining software repository (MSR) studies due to dataset generation and processing overhead, and the limited customization and language support of Boa. It introduces Boidae, a family of customizable Boa installations that can run locally in Docker or remotely via Ansible, along with scripts to generate datasets from GitHub and SourceForge and an extendable Boa compiler/runtime. The authors present the Boidae architecture, use case, and workflow, and demonstrate portability, custom dataset creation, language/runtime customization, and scalability on Hadoop-backed infrastructures. The contribution significantly lowers the barrier to bespoke MSR experiments by providing an open-source, scalable, and extensible mining platform with ready-to-use deployment options and a public Boa instance.

Abstract

Mining software repositories is a useful technique for researchers and practitioners to see what software developers actually do when developing software. Tools like Boa provide users with the ability to easily mine these open-source software repositories at a very large scale, with datasets containing hundreds of thousands of projects. The trade-off is that users must use the provided infrastructure, query language, runtime, and datasets and this might not fit all analysis needs. In this work, we present Boidae: a family of Boa installations controlled and customized by users. Boidae uses automation tools such as Ansible and Docker to facilitate the deployment of a customized Boa installation. In particular, Boidae allows the creation of custom datasets generated from any set of Git repositories, with helper scripts to aid in finding and cloning repositories from GitHub and SourceForge. In this paper, we briefly describe the architecture of Boidae and how researchers can utilize the infrastructure to generate custom datasets. Boidae's scripts and all infrastructure it builds upon are open-sourced. A video demonstration of Boidae's installation and extension is available at https://go.unl.edu/boidae.

Boidae: Your Personal Mining Platform

TL;DR

The paper addresses the difficulty of conducting mining software repository (MSR) studies due to dataset generation and processing overhead, and the limited customization and language support of Boa. It introduces Boidae, a family of customizable Boa installations that can run locally in Docker or remotely via Ansible, along with scripts to generate datasets from GitHub and SourceForge and an extendable Boa compiler/runtime. The authors present the Boidae architecture, use case, and workflow, and demonstrate portability, custom dataset creation, language/runtime customization, and scalability on Hadoop-backed infrastructures. The contribution significantly lowers the barrier to bespoke MSR experiments by providing an open-source, scalable, and extensible mining platform with ready-to-use deployment options and a public Boa instance.

Abstract

Mining software repositories is a useful technique for researchers and practitioners to see what software developers actually do when developing software. Tools like Boa provide users with the ability to easily mine these open-source software repositories at a very large scale, with datasets containing hundreds of thousands of projects. The trade-off is that users must use the provided infrastructure, query language, runtime, and datasets and this might not fit all analysis needs. In this work, we present Boidae: a family of Boa installations controlled and customized by users. Boidae uses automation tools such as Ansible and Docker to facilitate the deployment of a customized Boa installation. In particular, Boidae allows the creation of custom datasets generated from any set of Git repositories, with helper scripts to aid in finding and cloning repositories from GitHub and SourceForge. In this paper, we briefly describe the architecture of Boidae and how researchers can utilize the infrastructure to generate custom datasets. Boidae's scripts and all infrastructure it builds upon are open-sourced. A video demonstration of Boidae's installation and extension is available at https://go.unl.edu/boidae.
Paper Structure (8 sections, 4 figures)

This paper contains 8 sections, 4 figures.

Figures (4)

  • Figure 1: Overview of the general mining software repositories workflow
  • Figure 2: Overview of the Boidae architecture
  • Figure 3: Count number of annotations per project.
  • Figure 4: Task execution times as the number of maps increases Dyer-Nguyen-Rajan-Nguyen-13.