How good are my search strings? Reflections on using an existing review as a quasi-gold standard

Huynh Khanh Vi Tran; Jürgen Börstler; Nauman Bin Ali; Michael Unterkalmsteiner

How good are my search strings? Reflections on using an existing review as a quasi-gold standard

Huynh Khanh Vi Tran, Jürgen Börstler, Nauman Bin Ali, Michael Unterkalmsteiner

TL;DR

The paper investigates how to assess the quality of search strings in systematic literature studies by using a quasi-gold standard derived from an existing SLS. Through a comparative analysis of two tertiary studies (TAQ and ST), it reveals gaps and biases in QGS construction and the limitations of relying solely on recall/precision for validation. It then proposes extended guidelines that add an automated-search-analysis step and emphasizes QGS desirability characteristics (relevance, size, diversity) to improve search completeness. The work offers practical guidance for constructing more reliable search strategies in evidence-based software engineering and calls for broader validation across topics.

Abstract

Background: Systematic literature studies (SLS) have become a core research methodology in Evidence-based Software Engineering (EBSE). Search completeness, ie, finding all relevant papers on the topic of interest, has been recognized as one of the most commonly discussed validity issues of SLSs. Aim: This study aims at raising awareness on the issues related to search string construction and on search validation using a quasi-gold standard (QGS). Furthermore, we aim at providing guidelines for search string validation. Method: We use a recently completed tertiary study as a case and complement our findings with the observations from other researchers studying and advancing EBSE. Results: We found that the issue of assessing QGS quality has not seen much attention in the literature, and the validation of automated searches in SLSs could be improved. Hence, we propose to extend the current search validation approach by the additional analysis step of the automated search validation results and provide recommendations for the QGS construction. Conclusion: In this paper, we report on new issues which could affect search completeness in SLSs. Furthermore, the proposed guideline and recommendations could help researchers implement a more reliable search strategy in their SLSs.

How good are my search strings? Reflections on using an existing review as a quasi-gold standard

TL;DR

Abstract

Paper Structure (19 sections, 4 figures, 3 tables)

This paper contains 19 sections, 4 figures, 3 tables.

Guidelines for search validation
Related work
Analysis of using another SLS as QGS
Search process
Search performance evaluation using a QGS
Findings
The first search and the ST study's search
The third search and the ST study's search
The first search and the third search
Discussion
Issues in search string construction
Issues related to using Quasi-gold standards
Recommendations for QGS construction and search validation
QGS desirable characteristics
QGS construction
...and 4 more sections

Figures (4)

Figure 1: Overview of the search steps in the tertiary study on test artifact quality (TAQ study) tran2021assessing.
Figure 2: Overlaps between three searches in the tertiary study on test artifact quality (TAQ study) tran2021assessing. The red box illustrates the distribution of the selected papers among searches, and the numbers in parentheses show the number of papers belonging to the QGS.
Figure 3: Comparison of the search terms used in the search strings of the two tertiary studies, the TAQ study tran2021assessing and the ST study garousi_systematic_2016.
Figure 4: Overlaps between the first and third searches and the 58 SLRs/SMSs papers from the initial set of papers in the ST study garousi_systematic_2016. The red box illustrates the distribution of the papers of the QGS.

How good are my search strings? Reflections on using an existing review as a quasi-gold standard

TL;DR

Abstract

How good are my search strings? Reflections on using an existing review as a quasi-gold standard

Authors

TL;DR

Abstract

Table of Contents

Figures (4)