PhenoFlow: A Human-LLM Driven Visual Analytics System for Exploring Large and Complex Stroke Datasets

Jaeyoung Kim; Sihyeon Lee; Hyeon Jeon; Keon-Joo Lee; Hee-Joon Bae; Bohyoung Kim; Jinwook Seo

PhenoFlow: A Human-LLM Driven Visual Analytics System for Exploring Large and Complex Stroke Datasets

Jaeyoung Kim, Sihyeon Lee, Hyeon Jeon, Keon-Joo Lee, Hee-Joon Bae, Bohyoung Kim, Jinwook Seo

TL;DR

PhenoFlow tackles the challenge of analyzing large, irregular stroke datasets by introducing a human-LLM collaborative workflow in which an LLM acts as a data wrangler and clinicians supervise through visualizations and natural language queries. The system preserves patient privacy by using metadata to generate cohort definitions, synthesis code, and visual inspections, while features like the slice-and-wrap visualization enable efficient discovery of recurring and abnormal BP patterns. Through two case studies with neurologists, PhenoFlow demonstrates faster cohort construction, interpretable outputs, and the ability to identify clinically meaningful BP patterns and potential treatment-linked effects. The approach highlights the potential of integrating LLM-driven data wrangling with familiar visual encodings to reduce cognitive load and enhance data-driven clinical decision-making for acute ischemic stroke.

Abstract

Acute stroke demands prompt diagnosis and treatment to achieve optimal patient outcomes. However, the intricate and irregular nature of clinical data associated with acute stroke, particularly blood pressure (BP) measurements, presents substantial obstacles to effective visual analytics and decision-making. Through a year-long collaboration with experienced neurologists, we developed PhenoFlow, a visual analytics system that leverages the collaboration between human and Large Language Models (LLMs) to analyze the extensive and complex data of acute ischemic stroke patients. PhenoFlow pioneers an innovative workflow, where the LLM serves as a data wrangler while neurologists explore and supervise the output using visualizations and natural language interactions. This approach enables neurologists to focus more on decision-making with reduced cognitive load. To protect sensitive patient information, PhenoFlow only utilizes metadata to make inferences and synthesize executable codes, without accessing raw patient data. This ensures that the results are both reproducible and interpretable while maintaining patient privacy. The system incorporates a slice-and-wrap design that employs temporal folding to create an overlaid circular visualization. Combined with a linear bar graph, this design aids in exploring meaningful patterns within irregularly measured BP data. Through case studies, PhenoFlow has demonstrated its capability to support iterative analysis of extensive clinical datasets, reducing cognitive load and enabling neurologists to make well-informed decisions. Grounded in long-term collaboration with domain experts, our research demonstrates the potential of utilizing LLMs to tackle current challenges in data-driven clinical decision-making for acute ischemic stroke patients.

PhenoFlow: A Human-LLM Driven Visual Analytics System for Exploring Large and Complex Stroke Datasets

TL;DR

Abstract

Paper Structure (31 sections, 4 figures)

This paper contains 31 sections, 4 figures.

Introduction
Related Work
Time-Oriented Data and Stroke Visualization
LLMs for Clinical Research
Background
Collaborator and Dataset Description
Acute Ischemic Stroke
Data Characteristics
Problem Definition and Design
Design Process
Current Analysis Workflow and Limitations
Domain Goals
Visual Analysis Tasks
Design Requirements
Design of PhenoFlow
...and 16 more sections

Figures (4)

Figure 1: The process of summarizing a patient's BP trajectory. (A) Each patient's BP measurements are irregularly spaced over time. (B) Temporal folding and abstraction are applied to summarize the BP trajectory. (C) The BP range is then mapped to a color channel, and (D) data density is mapped to opacity to reveal the frequency of the measurements.
Figure 2: The process of summarizing a patient's BP trajectory using the slice-and-wrap visualization technique. (A) Each patient's BP measurements are irregularly spaced over time. (B) Temporal folding and abstraction create fixed-size segments (e.g., 24 hours) from the irregular data. (C) A circular visualization is generated for each segment, mapping time to angle and BP value to radius. (D) Data points within each segment are connected using the Centripetal Catmull-Rom spline with opacity applied to the curves. Then, circular visualizations are superimposed to reveal recurring temporal patterns in the patient's BP trajectory.
Figure 3: Case Study I - (A) P1 began her exploration by defining a cohort using a natural language query. (B) Through small multiples in the Inspection View, she verified that the results obtained by the LLM data wrangler aligned with her requirements. (C) By iteratively refining the cohort with natural language filters, she identified (D) two patients of interest in the matrix. (E) By comparing the two patients' data in detail and utilizing the tooltip (i.e., urokinase) and event marker (i.e., Sym HT), she discovered a potential influencing factor for their disparate outcomes.
Figure 4: Case Study II - (A) After defining the target cohort, (B) experts found that the LLM data wrangler needed an additional field to meet the user's request. (C) After modifying their query, they identified a patient with sustained high BP for 8 days. (D) Examining the patient's BP trajectory using the slice-and-wrap visualization, they discovered a triangular pattern that potentially indicated the patient's stabilized condition.

PhenoFlow: A Human-LLM Driven Visual Analytics System for Exploring Large and Complex Stroke Datasets

TL;DR

Abstract

PhenoFlow: A Human-LLM Driven Visual Analytics System for Exploring Large and Complex Stroke Datasets

Authors

TL;DR

Abstract

Table of Contents

Figures (4)