Almanac Copilot: Towards Autonomous Electronic Health Record Navigation

Cyril Zakka; Joseph Cho; Gracia Fahed; Rohan Shad; Michael Moor; Robyn Fong; Dhamanpreet Kaur; Vishnu Ravi; Oliver Aalami; Roxana Daneshjou; Akshay Chaudhari; William Hiesinger

Almanac Copilot: Towards Autonomous Electronic Health Record Navigation

Cyril Zakka, Joseph Cho, Gracia Fahed, Rohan Shad, Michael Moor, Robyn Fong, Dhamanpreet Kaur, Vishnu Ravi, Oliver Aalami, Roxana Daneshjou, Akshay Chaudhari, William Hiesinger

TL;DR

Almanac Copilot tackles clinician burnout by enabling autonomous navigation of electronic health records through a tool-using large language model. It combines a 33B parameter instruction-tuned LLM, Matryoshka embeddings, and external tools to perform information retrieval, data manipulation, and alert surfacing within a privacy-preserving, locally executed EHR environment. Evaluation on the EHR-QA benchmark (300 synthetic, physician-authored queries) shows Almanac Copilot achieving a 74% task-success rate with a mean score of 2.45 out of 3, highlighting potential to streamline workflows while acknowledging hallucination risks and the need for further improvements toward Level 2 autonomy and multi-modal data handling. The work demonstrates the practical impact of clinically aligned AI agents in reducing cognitive load and improving efficiency in real-world EMR use, with implications for safer and more scalable deployment in healthcare settings.

Abstract

Clinicians spend large amounts of time on clinical documentation, and inefficiencies impact quality of care and increase clinician burnout. Despite the promise of electronic medical records (EMR), the transition from paper-based records has been negatively associated with clinician wellness, in part due to poor user experience, increased burden of documentation, and alert fatigue. In this study, we present Almanac Copilot, an autonomous agent capable of assisting clinicians with EMR-specific tasks such as information retrieval and order placement. On EHR-QA, a synthetic evaluation dataset of 300 common EHR queries based on real patient data, Almanac Copilot obtains a successful task completion rate of 74% (n = 221 tasks) with a mean score of 2.45 over 3 (95% CI:2.34-2.56). By automating routine tasks and streamlining the documentation process, our findings highlight the significant potential of autonomous agents to mitigate the cognitive load imposed on clinicians by current EMR systems.

Almanac Copilot: Towards Autonomous Electronic Health Record Navigation

TL;DR

Abstract

Paper Structure (16 sections, 2 figures, 3 tables)

This paper contains 16 sections, 2 figures, 3 tables.

Introduction
Related Work
Methods
Architecture
Large Language Model
Embedding Model
Tools
EHR-QA Dataset Generation
Evaluation
Results
Discussion
Acknowledgments
Funding
Competing interests
Authors' contributions
...and 1 more sections

Figures (2)

Figure 1: Overview of the Almanac Copilot Architecture. Upon receiving a query, the system dynamically selects a subset of APIs from a predetermined list of functions (i.e. FHIR functions, browser, calculator), optimizing the process to meet the specific requirements of the query.
Figure 2: Performance Evaluation of Almanac Copilot, ChatGPT-4, Claude 3 Opus and Biomistral on EHR-QA a) The stacked bar plot illustrates the frequency of scores obtained across 300 synthetic questions within the EHR-QA framework. b) The heatmaps illustrate the models' performance in responding to the same dataset of questions. Each green square indicates a perfect score across all assessed metrics on the task in question. In contrast, red squares signal the lowest performance level. The scoring is sequential — subsequent correct actions are not credited if preceding steps are incorrect.

Almanac Copilot: Towards Autonomous Electronic Health Record Navigation

TL;DR

Abstract

Almanac Copilot: Towards Autonomous Electronic Health Record Navigation

TL;DR

Abstract

Table of Contents

Figures (2)