Table of Contents
Fetching ...

ACR: A Benchmark for Automatic Cohort Retrieval

Dung Ngoc Thai, Victor Ardulov, Jose Ulises Mena, Simran Tiwari, Gleb Erofeev, Ramy Eskander, Karim Tarabishy, Ravi B Parikh, Wael Salloum

TL;DR

This work defines Automatic Cohort Retrieval (ACR) to extract longitudinal, multi-document patient cohorts from unstructured EMRs at clinical scale. It contrasts three baselines—Retriever-only, Retrieve-then-read, and a neuro-symbolic Hypercube system—using a new oncology-centric benchmark comprising 113 queries and 1,436 de-identified patient records. Results show that neuro-symbolic Hypercube offers superior F1 and lower hallucination tendencies compared with pure LLM-based approaches, while online reader-based methods improve precision at substantial cost. The study provides an evaluation framework including longitudinal reasoning, set-theoretic consistency, and hallucination metrics, highlighting the value of integrating domain knowledge and offline reasoning for practical, scalable ACR in healthcare.

Abstract

Identifying patient cohorts is fundamental to numerous healthcare tasks, including clinical trial recruitment and retrospective studies. Current cohort retrieval methods in healthcare organizations rely on automated queries of structured data combined with manual curation, which are time-consuming, labor-intensive, and often yield low-quality results. Recent advancements in large language models (LLMs) and information retrieval (IR) offer promising avenues to revolutionize these systems. Major challenges include managing extensive eligibility criteria and handling the longitudinal nature of unstructured Electronic Medical Records (EMRs) while ensuring that the solution remains cost-effective for real-world application. This paper introduces a new task, Automatic Cohort Retrieval (ACR), and evaluates the performance of LLMs and commercial, domain-specific neuro-symbolic approaches. We provide a benchmark task, a query dataset, an EMR dataset, and an evaluation framework. Our findings underscore the necessity for efficient, high-quality ACR systems capable of longitudinal reasoning across extensive patient databases.

ACR: A Benchmark for Automatic Cohort Retrieval

TL;DR

This work defines Automatic Cohort Retrieval (ACR) to extract longitudinal, multi-document patient cohorts from unstructured EMRs at clinical scale. It contrasts three baselines—Retriever-only, Retrieve-then-read, and a neuro-symbolic Hypercube system—using a new oncology-centric benchmark comprising 113 queries and 1,436 de-identified patient records. Results show that neuro-symbolic Hypercube offers superior F1 and lower hallucination tendencies compared with pure LLM-based approaches, while online reader-based methods improve precision at substantial cost. The study provides an evaluation framework including longitudinal reasoning, set-theoretic consistency, and hallucination metrics, highlighting the value of integrating domain knowledge and offline reasoning for practical, scalable ACR in healthcare.

Abstract

Identifying patient cohorts is fundamental to numerous healthcare tasks, including clinical trial recruitment and retrospective studies. Current cohort retrieval methods in healthcare organizations rely on automated queries of structured data combined with manual curation, which are time-consuming, labor-intensive, and often yield low-quality results. Recent advancements in large language models (LLMs) and information retrieval (IR) offer promising avenues to revolutionize these systems. Major challenges include managing extensive eligibility criteria and handling the longitudinal nature of unstructured Electronic Medical Records (EMRs) while ensuring that the solution remains cost-effective for real-world application. This paper introduces a new task, Automatic Cohort Retrieval (ACR), and evaluates the performance of LLMs and commercial, domain-specific neuro-symbolic approaches. We provide a benchmark task, a query dataset, an EMR dataset, and an evaluation framework. Our findings underscore the necessity for efficient, high-quality ACR systems capable of longitudinal reasoning across extensive patient databases.
Paper Structure (50 sections, 6 figures, 7 tables)

This paper contains 50 sections, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Longitudinal data challenges in Cohort Retrieval: This example shows a patient medical journey depicting 3 related facts scattered in 3 documents written over the years. Cohort retrieval systems must possess longitudinal reasoning capacities to accurately answer the user query shown above.
  • Figure 2: Architecture of a generic ACR system: Given a query, large-scale reasoning is conducted over numerous patients with longitudinal EMRs. This involves text-based reasoning on document or chunks, followed by longitudinal reasoning over time.
  • Figure 3: F1-Score stratified by complexity determined by (a) experts, (b) cohort size, and (c) document count per patient ($N_d$).
  • Figure 4: A plot of hallucination ratio against gold cohort sizes for all queries of the Broad, Narrow, and Sparse types. It helps identify queries where models over-hallucinate, retrieving too many unqualified patients relative to actual qualified ones.
  • Figure 5: Hallucination tendencies on every Zero-Result query (x-axis) measured as False Positives counts (y-axis).
  • ...and 1 more figures