Table of Contents
Fetching ...

PyPackIT: Automated Research Software Engineering for Scientific Python Applications on GitHub

Armin Ariamajd, Raquel López-Ríos de Castro, Andrea Volkamer

TL;DR

This paper presents PyPackIT, an open-source, cloud-based tool that automates research software engineering for scientific Python projects on GitHub. By integrating a centralized control center, a build-ready Python package skeleton, automated testing, documentation, licensing, and continuous integration/deployment pipelines, PyPackIT enforces FAIR and Open Science principles throughout the software life cycle. The framework supports automated issue management, version control with a SemVer-based branching model, and continuous maintenance to sustain long-term usability, reproducibility, and interoperability. Its cloud-native, containerized, and IaC-driven architecture reduces upfront setup burden, enabling scientists to focus on scientific development while ensuring robust, reusable, and well-documented software across indexing repositories and publication workflows.

Abstract

The increasing importance of Computational Science and Engineering has highlighted the need for high-quality scientific software. However, research software development is often hindered by limited funding, time, staffing, and technical resources. To address these challenges, we introduce PyPackIT, a cloud-based automation tool designed to streamline research software engineering in accordance with FAIR (Findable, Accessible, Interoperable, and Reusable) and Open Science principles. PyPackIT is a user-friendly, ready-to-use software that enables scientists to focus on the scientific aspects of their projects while automating repetitive tasks and enforcing best practices throughout the software development life cycle. Using modern Continuous software engineering and DevOps methodologies, PyPackIT offers a robust project infrastructure including a build-ready Python package skeleton, a fully operational documentation and test suite, and a control center for dynamic project management and customization. PyPackIT integrates seamlessly with GitHub's version control system, issue tracker, and pull-based model to establish a fully-automated software development workflow. Exploiting GitHub Actions, PyPackIT provides a cloud-native Agile development environment using containerization, Configuration-as-Code, and Continuous Integration, Deployment, Testing, Refactoring, and Maintenance pipelines. PyPackIT is an open-source software suite that seamlessly integrates with both new and existing projects via a public GitHub repository template at https://github.com/repodynamics/pypackit.

PyPackIT: Automated Research Software Engineering for Scientific Python Applications on GitHub

TL;DR

This paper presents PyPackIT, an open-source, cloud-based tool that automates research software engineering for scientific Python projects on GitHub. By integrating a centralized control center, a build-ready Python package skeleton, automated testing, documentation, licensing, and continuous integration/deployment pipelines, PyPackIT enforces FAIR and Open Science principles throughout the software life cycle. The framework supports automated issue management, version control with a SemVer-based branching model, and continuous maintenance to sustain long-term usability, reproducibility, and interoperability. Its cloud-native, containerized, and IaC-driven architecture reduces upfront setup burden, enabling scientists to focus on scientific development while ensuring robust, reusable, and well-documented software across indexing repositories and publication workflows.

Abstract

The increasing importance of Computational Science and Engineering has highlighted the need for high-quality scientific software. However, research software development is often hindered by limited funding, time, staffing, and technical resources. To address these challenges, we introduce PyPackIT, a cloud-based automation tool designed to streamline research software engineering in accordance with FAIR (Findable, Accessible, Interoperable, and Reusable) and Open Science principles. PyPackIT is a user-friendly, ready-to-use software that enables scientists to focus on the scientific aspects of their projects while automating repetitive tasks and enforcing best practices throughout the software development life cycle. Using modern Continuous software engineering and DevOps methodologies, PyPackIT offers a robust project infrastructure including a build-ready Python package skeleton, a fully operational documentation and test suite, and a control center for dynamic project management and customization. PyPackIT integrates seamlessly with GitHub's version control system, issue tracker, and pull-based model to establish a fully-automated software development workflow. Exploiting GitHub Actions, PyPackIT provides a cloud-native Agile development environment using containerization, Configuration-as-Code, and Continuous Integration, Deployment, Testing, Refactoring, and Maintenance pipelines. PyPackIT is an open-source software suite that seamlessly integrates with both new and existing projects via a public GitHub repository template at https://github.com/repodynamics/pypackit.

Paper Structure

This paper contains 26 sections, 4 figures, 8 tables.

Figures (4)

  • Figure 1: PyPackIT's software development workflow. Labeled arrows represent manual tasks performed by users: Report, Design, Commit, and Review. All other activities are automated, which fall into four main categories spanning different stages of the software development life cycle: Issue Management, Version Control, Continuous Integration and Deployment, and Continuous Maintenance, Refactoring, and Testing.
  • Figure 2: Default homepage of the project documentation website generated by PyPackIT. Circled numbers mark dynamic elements that are automatically updated according to control center configurations: 1) logo; 2) links to external resources on GitHub and other indexing repositories; 3) abstract; 4) highlights; 5) license and copyright; and 6) links to important website pages.
  • Figure 3: Schematic overview of Semantic Versioning. Version numbers follow the X.Y.Z format, where X, Y, and Z represent major, minor, and patch numbers, respectively. The major number is incremented for backward-incompatible changes, the minor number for backward-compatible changes, and the patch number for bug fixes. For each release, one of these components is incremented by 1, while the ones to its right are reset to 0. The public API is introduced in version 1.0.0, while major version zero (0.y.z) is for initial development and signals an unstable API.
  • Figure 4: PyPackIT's version control strategy is demonstrated with an example starting from final release version 1.0.0. Two individual features (issue ticket numbers 1 and 2) are simultaneously implemented in separate development branches, where each iteration is published as a developmental release with a unique version number (green: release segment, orange: prerelease segment, and red: developmental release segment). Note the incorporation of issue ticket numbers into the prerelease segments. Feature 2—demonstrating a short release cycle—is merged directly into the release branch as a final release, while Feature 1 undergoes a prerelease phase with with further refinements published as post-releases. After alpha, beta, and release candidate stages, it is merged back into the original release branch as a final release. The final version is automatically redetermined during merging so that 1.1.0rc1.postN is released as final version 1.2.0, since 1.1.0a2.devN was already released as 1.1.0.