Analysis and Validation of Image Search Engines in Histopathology

Isaiah Lahr; Saghir Alfasly; Peyman Nejat; Jibran Khan; Luke Kottom; Vaishnavi Kumbhar; Areej Alsaafin; Abubakr Shafique; Sobhan Hemati; Ghazal Alabtah; Nneka Comfere; Dennis Murphee; Aaron Mangold; Saba Yasir; Chady Meroueh; Lisa Boardman; Vijay H. Shah; Joaquin J. Garcia; H. R. Tizhoosh

Analysis and Validation of Image Search Engines in Histopathology

Isaiah Lahr, Saghir Alfasly, Peyman Nejat, Jibran Khan, Luke Kottom, Vaishnavi Kumbhar, Areej Alsaafin, Abubakr Shafique, Sobhan Hemati, Ghazal Alabtah, Nneka Comfere, Dennis Murphee, Aaron Mangold, Saba Yasir, Chady Meroueh, Lisa Boardman, Vijay H. Shah, Joaquin J. Garcia, H. R. Tizhoosh

TL;DR

Analysis and validation of four search methods bag of visual words (BoVW), Yottixel, SISH, RetCCL, and some of their potential variants are reported, demonstrating efficiency and speed but suffering from low accuracy.

Abstract

Searching for similar images in archives of histology and histopathology images is a crucial task that may aid in patient matching for various purposes, ranging from triaging and diagnosis to prognosis and prediction. Whole slide images (WSIs) are highly detailed digital representations of tissue specimens mounted on glass slides. Matching WSI to WSI can serve as the critical method for patient matching. In this paper, we report extensive analysis and validation of four search methods bag of visual words (BoVW), Yottixel, SISH, RetCCL, and some of their potential variants. We analyze their algorithms and structures and assess their performance. For this evaluation, we utilized four internal datasets ($1269$ patients) and three public datasets ($1207$ patients), totaling more than $200,000$ patches from $38$ different classes/subtypes across five primary sites. Certain search engines, for example, BoVW, exhibit notable efficiency and speed but suffer from low accuracy. Conversely, search engines like Yottixel demonstrate efficiency and speed, providing moderately accurate results. Recent proposals, including SISH, display inefficiency and yield inconsistent outcomes, while alternatives like RetCCL prove inadequate in both accuracy and efficiency. Further research is imperative to address the dual aspects of accuracy and minimal storage requirements in histopathological image search.

Analysis and Validation of Image Search Engines in Histopathology

TL;DR

Abstract

patients) and three public datasets (

patients), totaling more than

patches from

different classes/subtypes across five primary sites. Certain search engines, for example, BoVW, exhibit notable efficiency and speed but suffer from low accuracy. Conversely, search engines like Yottixel demonstrate efficiency and speed, providing moderately accurate results. Recent proposals, including SISH, display inefficiency and yield inconsistent outcomes, while alternatives like RetCCL prove inadequate in both accuracy and efficiency. Further research is imperative to address the dual aspects of accuracy and minimal storage requirements in histopathological image search.

Paper Structure (29 sections, 1 equation, 9 figures, 12 tables)

This paper contains 29 sections, 1 equation, 9 figures, 12 tables.

Introduction
Image Search in Histopathology
BoVW
Yottixel
SISH
RetCCL
Methods
Experimental Setup
Implementation
Pre-Trained Networks
Datasets
Internal Datasets
Public Datasets
Analysis and Results
Analysis of Algorithmic Structure
...and 14 more sections

Figures (9)

Figure 1: BoVW indexing pipeline. Using a dictionary of visual words, the BoVW approach can represent a WSI with a histogram of visual words. Whereas various handcrafted features have been used in literature, herein, deep features are more likely to effectively represent visual words.
Figure 2: Yottixel indexing pipeline. Yottixel introduced two novel components: mosaic generation and barcoding. The mosaic is the result of a two-stage clustering that creates a small set of patches in an unsupervised fashion. Barcoding is the binarization of deep feature vectors to generate a bunch of barcodes. Yottixel does not offer a trained network for feature extraction and uses a pre-trained network as a placeholder; any deep network can be used to provide deep features for the mosaic.
Figure 3: SISH indexing pipeline. SISH uses the entire Yottixel chain (pink part) and adds an autoencoder and codebook as an additional indexing scheme (blue part). SISH then uses a tree to match patches. As the authors report, this addition to Yottixel does not yield good results sish, so they post-process the search results with a separate ranking scheme after the search.
Figure 4: RetCCL indexing pipeline. RetCCL uses Yottixel's mosaic but replaces color histograms with deep features from the CCL network, a custom-trained network (blue blocks). It then uses cosine similarity to compare patches. RetCCL uses the post-search SISH ranking algorithm.
Figure 5: Thumbnails of two sample WSIs from the breast dataset: (left) invasive breast carcinoma of no special type, (right) intraductal papilloma with columnar cell Lesions.
...and 4 more figures

Analysis and Validation of Image Search Engines in Histopathology

TL;DR

Abstract

Analysis and Validation of Image Search Engines in Histopathology

Authors

TL;DR

Abstract

Table of Contents

Figures (9)