Table of Contents
Fetching ...

Ten simple rules for training scientists to make better software

Kit Gallagher, Richard Creswell, Ben Lambert, Martin Robinson, Chon Lok Lei, Gary R. Mirams, David J. Gavaghan

TL;DR

These guidelines are directed towards an audience of students that have some programming literacy but little formal training in software development or engineering, typical of early doctoral students, and believe they should form a key part of postgraduate training schemes more generally in the life sciences.

Abstract

Computational methods and associated software implementations are central to every field of scientific investigation. Modern biological research, particularly within systems biology, has relied heavily on the development of software tools to process and organize increasingly large datasets, simulate complex mechanistic models, provide tools for the analysis and management of data, and visualize and organize outputs. However, developing high-quality research software requires scientists to develop a host of software development skills, and teaching these skills to students is challenging. There has been a growing importance placed on ensuring reproducibility and good development practices in computational research. However, less attention has been devoted to informing the specific teaching strategies which are effective at nurturing in researchers the complex skillset required to produce high-quality software that, increasingly, is required to underpin both academic and industrial biomedical research. Recent articles in the Ten Simple Rules collection have discussed the teaching of foundational computer science and coding techniques to biology students. We advance this discussion by describing the specific steps for effectively teaching the necessary skills scientists need to develop sustainable software packages which are fit for (re-)use in academic research or more widely. Although our advice is likely to be applicable to all students and researchers hoping to improve their software development skills, our guidelines are directed towards an audience of students that have some programming literacy but little formal training in software development or engineering, typical of early doctoral students. These practices are also applicable outside of doctoral training environments, and we believe they should form a key part of postgraduate training schemes more generally in the life sciences.

Ten simple rules for training scientists to make better software

TL;DR

These guidelines are directed towards an audience of students that have some programming literacy but little formal training in software development or engineering, typical of early doctoral students, and believe they should form a key part of postgraduate training schemes more generally in the life sciences.

Abstract

Computational methods and associated software implementations are central to every field of scientific investigation. Modern biological research, particularly within systems biology, has relied heavily on the development of software tools to process and organize increasingly large datasets, simulate complex mechanistic models, provide tools for the analysis and management of data, and visualize and organize outputs. However, developing high-quality research software requires scientists to develop a host of software development skills, and teaching these skills to students is challenging. There has been a growing importance placed on ensuring reproducibility and good development practices in computational research. However, less attention has been devoted to informing the specific teaching strategies which are effective at nurturing in researchers the complex skillset required to produce high-quality software that, increasingly, is required to underpin both academic and industrial biomedical research. Recent articles in the Ten Simple Rules collection have discussed the teaching of foundational computer science and coding techniques to biology students. We advance this discussion by describing the specific steps for effectively teaching the necessary skills scientists need to develop sustainable software packages which are fit for (re-)use in academic research or more widely. Although our advice is likely to be applicable to all students and researchers hoping to improve their software development skills, our guidelines are directed towards an audience of students that have some programming literacy but little formal training in software development or engineering, typical of early doctoral students. These practices are also applicable outside of doctoral training environments, and we believe they should form a key part of postgraduate training schemes more generally in the life sciences.
Paper Structure (13 sections, 7 figures)

This paper contains 13 sections, 7 figures.

Figures (7)

  • Figure 1: While sustainable approaches to software development can require more effort than a quick-and-dirty coding style initially, they quickly become time-saving over medium-to-long time scales.
  • Figure 2: Codes snippets A., B., and C. are all valid Python and implement identical functionality. However, they vary in readability: A. is difficult to understand for anyone except the original author; B. is much improved, but perhaps suffers in readability due to verbose commenting; C., however, should be directly interpretable on account of meaningful variable names and sensible object-oriented architecture.
  • Figure 3: Version control tools such as GitHub make it easy for developers to keep track of changes in a software codebase.
  • Figure 4: To keep students of all interests and backgrounds excited, we recommend software projects that are oriented towards appealing scientific subjects (e.g., biology, epidemiology, chemistry), rather than projects that dwell merely on more foundational computer science or algorithmic tasks.
  • Figure 5: While all year-long projects should produce a complete and usable product, these can then be further extended by subsequent cohorts. Such extensions should not be trivial 'bolt-ons', but rather fundamentally extend the functionality of the software output, which may require students to modify/refactor existing elements of the codebase.
  • ...and 2 more figures