Table of Contents
Fetching ...

Beyond Accuracy: An Empirical Study on Unit Testing in Open-source Deep Learning Projects

Han Wang, Sijia Yu, Chunyang Chen, Burak Turhan, Xiaodong Zhu

TL;DR

A mapping taxonomy between unit tests and faults in DL projects is built and it is found that unit tested DL projects have positive correlation with the open-source project metrics and have a higher acceptance rate of pull requests.

Abstract

Deep Learning (DL) models have rapidly advanced, focusing on achieving high performance through testing model accuracy and robustness. However, it is unclear whether DL projects, as software systems, are tested thoroughly or functionally correct when there is a need to treat and test them like other software systems. Therefore, we empirically study the unit tests in open-source DL projects, analyzing 9,129 projects from GitHub. We find that: 1) unit tested DL projects have positive correlation with the open-source project metrics and have a higher acceptance rate of pull requests, 2) 68% of the sampled DL projects are not unit tested at all, 3) the layer and utilities (utils) of DL models have the most unit tests. Based on these findings and previous research outcomes, we built a mapping taxonomy between unit tests and faults in DL projects. We discuss the implications of our findings for developers and researchers and highlight the need for unit testing in open-source DL projects to ensure their reliability and stability. The study contributes to this community by raising awareness of the importance of unit testing in DL projects and encouraging further research in this area.

Beyond Accuracy: An Empirical Study on Unit Testing in Open-source Deep Learning Projects

TL;DR

A mapping taxonomy between unit tests and faults in DL projects is built and it is found that unit tested DL projects have positive correlation with the open-source project metrics and have a higher acceptance rate of pull requests.

Abstract

Deep Learning (DL) models have rapidly advanced, focusing on achieving high performance through testing model accuracy and robustness. However, it is unclear whether DL projects, as software systems, are tested thoroughly or functionally correct when there is a need to treat and test them like other software systems. Therefore, we empirically study the unit tests in open-source DL projects, analyzing 9,129 projects from GitHub. We find that: 1) unit tested DL projects have positive correlation with the open-source project metrics and have a higher acceptance rate of pull requests, 2) 68% of the sampled DL projects are not unit tested at all, 3) the layer and utilities (utils) of DL models have the most unit tests. Based on these findings and previous research outcomes, we built a mapping taxonomy between unit tests and faults in DL projects. We discuss the implications of our findings for developers and researchers and highlight the need for unit testing in open-source DL projects to ensure their reliability and stability. The study contributes to this community by raising awareness of the importance of unit testing in DL projects and encouraging further research in this area.
Paper Structure (9 sections, 6 figures, 4 tables)

This paper contains 9 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: An example code for a loss function with its unit test code and a flowchart showing the units and their interactions in DL projects.
  • Figure 2: Data collection pipelines for the research questions.
  • Figure 3: A comparison of Github Metrics between the unit-tested open source DL projects and the untested projects.
  • Figure 4: A comparison of Github issue label categories of unit tested and untested DL projects (in percentage).
  • Figure 5: The distribution of open-source DL projects based on their file associated with tests rate.
  • ...and 1 more figures