Table of Contents
Fetching ...

How good are my search strings? Reflections on using an existing review as a quasi-gold standard

Huynh Khanh Vi Tran, Jürgen Börstler, Nauman Bin Ali, Michael Unterkalmsteiner

TL;DR

The paper investigates how to assess the quality of search strings in systematic literature studies by using a quasi-gold standard derived from an existing SLS. Through a comparative analysis of two tertiary studies (TAQ and ST), it reveals gaps and biases in QGS construction and the limitations of relying solely on recall/precision for validation. It then proposes extended guidelines that add an automated-search-analysis step and emphasizes QGS desirability characteristics (relevance, size, diversity) to improve search completeness. The work offers practical guidance for constructing more reliable search strategies in evidence-based software engineering and calls for broader validation across topics.

Abstract

Background: Systematic literature studies (SLS) have become a core research methodology in Evidence-based Software Engineering (EBSE). Search completeness, ie, finding all relevant papers on the topic of interest, has been recognized as one of the most commonly discussed validity issues of SLSs. Aim: This study aims at raising awareness on the issues related to search string construction and on search validation using a quasi-gold standard (QGS). Furthermore, we aim at providing guidelines for search string validation. Method: We use a recently completed tertiary study as a case and complement our findings with the observations from other researchers studying and advancing EBSE. Results: We found that the issue of assessing QGS quality has not seen much attention in the literature, and the validation of automated searches in SLSs could be improved. Hence, we propose to extend the current search validation approach by the additional analysis step of the automated search validation results and provide recommendations for the QGS construction. Conclusion: In this paper, we report on new issues which could affect search completeness in SLSs. Furthermore, the proposed guideline and recommendations could help researchers implement a more reliable search strategy in their SLSs.

How good are my search strings? Reflections on using an existing review as a quasi-gold standard

TL;DR

The paper investigates how to assess the quality of search strings in systematic literature studies by using a quasi-gold standard derived from an existing SLS. Through a comparative analysis of two tertiary studies (TAQ and ST), it reveals gaps and biases in QGS construction and the limitations of relying solely on recall/precision for validation. It then proposes extended guidelines that add an automated-search-analysis step and emphasizes QGS desirability characteristics (relevance, size, diversity) to improve search completeness. The work offers practical guidance for constructing more reliable search strategies in evidence-based software engineering and calls for broader validation across topics.

Abstract

Background: Systematic literature studies (SLS) have become a core research methodology in Evidence-based Software Engineering (EBSE). Search completeness, ie, finding all relevant papers on the topic of interest, has been recognized as one of the most commonly discussed validity issues of SLSs. Aim: This study aims at raising awareness on the issues related to search string construction and on search validation using a quasi-gold standard (QGS). Furthermore, we aim at providing guidelines for search string validation. Method: We use a recently completed tertiary study as a case and complement our findings with the observations from other researchers studying and advancing EBSE. Results: We found that the issue of assessing QGS quality has not seen much attention in the literature, and the validation of automated searches in SLSs could be improved. Hence, we propose to extend the current search validation approach by the additional analysis step of the automated search validation results and provide recommendations for the QGS construction. Conclusion: In this paper, we report on new issues which could affect search completeness in SLSs. Furthermore, the proposed guideline and recommendations could help researchers implement a more reliable search strategy in their SLSs.
Paper Structure (19 sections, 4 figures, 3 tables)

This paper contains 19 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Overview of the search steps in the tertiary study on test artifact quality (TAQ study) tran2021assessing.
  • Figure 2: Overlaps between three searches in the tertiary study on test artifact quality (TAQ study) tran2021assessing. The red box illustrates the distribution of the selected papers among searches, and the numbers in parentheses show the number of papers belonging to the QGS.
  • Figure 3: Comparison of the search terms used in the search strings of the two tertiary studies, the TAQ study tran2021assessing and the ST study garousi_systematic_2016.
  • Figure 4: Overlaps between the first and third searches and the 58 SLRs/SMSs papers from the initial set of papers in the ST study garousi_systematic_2016. The red box illustrates the distribution of the papers of the QGS.