Table of Contents
Fetching ...

Interpreting COVID Lateral Flow Tests' Results with Foundation Models

Stuti Pandey, Josh Myers-Dean, Jarek Reynolds, Danna Gurari

TL;DR

This work tackles automated interpretation of COVID LFT images with modern foundation vision-language models. It introduces LFT-Grounding, a dataset with groundings for the LFT and its nested result window, and conducts a zero-shot benchmark of eight VLMs across prompts and grounding outputs. The findings show that most models struggle to correctly identify test type, extract results, and localize small, nested windows, with GPT-4V showing the strongest caption-based performance and grounding methods lagging in IoU, especially for nested regions. The dataset and analyses aim to spur progress toward accessible, scalable LFT interpretation and can extend to other rapid diagnostic tests, improving both individual accessibility and public-health monitoring.

Abstract

Lateral flow tests (LFTs) enable rapid, low-cost testing for health conditions including Covid, pregnancy, HIV, and malaria. Automated readers of LFT results can yield many benefits including empowering blind people to independently learn about their health and accelerating data entry for large-scale monitoring (e.g., for pandemics such as Covid) by using only a single photograph per LFT test. Accordingly, we explore the abilities of modern foundation vision language models (VLMs) in interpreting such tests. To enable this analysis, we first create a new labeled dataset with hierarchical segmentations of each LFT test and its nested test result window. We call this dataset LFT-Grounding. Next, we benchmark eight modern VLMs in zero-shot settings for analyzing these images. We demonstrate that current VLMs frequently fail to correctly identify the type of LFT test, interpret the test results, locate the nested result window of the LFT tests, and recognize LFT tests when they partially obfuscated. To facilitate community-wide progress towards automated LFT reading, we publicly release our dataset at https://iamstuti.github.io/lft_grounding_foundation_models/.

Interpreting COVID Lateral Flow Tests' Results with Foundation Models

TL;DR

This work tackles automated interpretation of COVID LFT images with modern foundation vision-language models. It introduces LFT-Grounding, a dataset with groundings for the LFT and its nested result window, and conducts a zero-shot benchmark of eight VLMs across prompts and grounding outputs. The findings show that most models struggle to correctly identify test type, extract results, and localize small, nested windows, with GPT-4V showing the strongest caption-based performance and grounding methods lagging in IoU, especially for nested regions. The dataset and analyses aim to spur progress toward accessible, scalable LFT interpretation and can extend to other rapid diagnostic tests, improving both individual accessibility and public-health monitoring.

Abstract

Lateral flow tests (LFTs) enable rapid, low-cost testing for health conditions including Covid, pregnancy, HIV, and malaria. Automated readers of LFT results can yield many benefits including empowering blind people to independently learn about their health and accelerating data entry for large-scale monitoring (e.g., for pandemics such as Covid) by using only a single photograph per LFT test. Accordingly, we explore the abilities of modern foundation vision language models (VLMs) in interpreting such tests. To enable this analysis, we first create a new labeled dataset with hierarchical segmentations of each LFT test and its nested test result window. We call this dataset LFT-Grounding. Next, we benchmark eight modern VLMs in zero-shot settings for analyzing these images. We demonstrate that current VLMs frequently fail to correctly identify the type of LFT test, interpret the test results, locate the nested result window of the LFT tests, and recognize LFT tests when they partially obfuscated. To facilitate community-wide progress towards automated LFT reading, we publicly release our dataset at https://iamstuti.github.io/lft_grounding_foundation_models/.
Paper Structure (21 sections, 6 figures, 5 tables)

This paper contains 21 sections, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Examples from our dataset of images showing COVID-19 LFTs with positive results (first row) and negative results (second row). We introduce segmentations of each LFT test (indicated in purple) and its test result window (indicated in orange).
  • Figure 2: Boxplots for each of the four metrics used to analyze LFT-Grounding. Each boxplot shows the overall results, alongside fine-grained result for positive and negative tests. (a) image coverage; (b) boundary complexity; (c) test coverage; (d) NAR. The lines in each boxplot represent medians, the bottoms and tops of each boxplot represent the 25th and 75th percentiles respectively, whiskers represent most extreme data not considered outliers, and circles represent outliers. (Res.=Result window; Over.=Overall; Pos.=Positive; Neg.=Negative; NAR=Normalized aspect ratio)
  • Figure 3: Example of generated captions by the models when they are notified in the prompts of the Covid test's location.
  • Figure 4: Examples when CogVLM did not generate bounding box predictions for both the Covid test and its result window.
  • Figure 5: Examples of ground-truth (purple overlay) and GLaMM predictions (orange overlay) for locating the Covid test as well as its nested result window.
  • ...and 1 more figures