Table of Contents
Fetching ...

Retrieval Augmented Thought Process for Private Data Handling in Healthcare

Thomas Pouplin, Hao Sun, Samuel Holt, Mihaela van der Schaar

TL;DR

This work addresses privacy and data-staleness barriers to deploying LLMs in healthcare by grounding reasoning in external documents through the Retrieval-Augmented Thought Process (RATP). RATP formalizes open-book QA as a multi-step decision problem and solves it with Monte-Carlo Tree Search, using either a model-based score estimator or a self-critic to evaluate thoughts, while keeping LLMs frozen to protect private data. Empirical results on private EMRQA and EhrQA datasets show sizable accuracy gains over in-context RAG, with additional improvements when incorporating retrieved documents and robust exploration of the thought space. The framework delivers transparent, step-by-step reasoning traces and is readily generalizable, offering a privacy-preserving path to clinically grounded AI assistance.

Abstract

Large Language Models (LLMs) have demonstrated the strong potential to assist both clinicians and the general public with their extensive medical knowledge. However, their application in healthcare is constrained due to concerns about the privacy of data used in training, which prevents the integration of private and personal information because of security and ethical issues. Moreover, if their capabilities can be enhanced with information retrieval to access up-to-date knowledge, the current integration of LLMs with Information retrieval lacks robustness to imperfect retrieval, which can hinder their effectiveness and even reduce overall performance. In this work, we address this challenge by introducing the Retrieval-Augmented Thought Process (RATP). Given access to external knowledge, RATP formulates the thought generation of LLMs as a multiple-step decision process. To optimise such a thought process, RATP leverages Monte-Carlo Tree Search and learns a proxy reward function that permits cost-efficient inference. On a private dataset of electronic medical records, deliberately excluded from any LLM training set, RATP achieves 35% additional accuracy compared to in-context retrieval-augmented generation for the question-answering task.

Retrieval Augmented Thought Process for Private Data Handling in Healthcare

TL;DR

This work addresses privacy and data-staleness barriers to deploying LLMs in healthcare by grounding reasoning in external documents through the Retrieval-Augmented Thought Process (RATP). RATP formalizes open-book QA as a multi-step decision problem and solves it with Monte-Carlo Tree Search, using either a model-based score estimator or a self-critic to evaluate thoughts, while keeping LLMs frozen to protect private data. Empirical results on private EMRQA and EhrQA datasets show sizable accuracy gains over in-context RAG, with additional improvements when incorporating retrieved documents and robust exploration of the thought space. The framework delivers transparent, step-by-step reasoning traces and is readily generalizable, offering a privacy-preserving path to clinically grounded AI assistance.

Abstract

Large Language Models (LLMs) have demonstrated the strong potential to assist both clinicians and the general public with their extensive medical knowledge. However, their application in healthcare is constrained due to concerns about the privacy of data used in training, which prevents the integration of private and personal information because of security and ethical issues. Moreover, if their capabilities can be enhanced with information retrieval to access up-to-date knowledge, the current integration of LLMs with Information retrieval lacks robustness to imperfect retrieval, which can hinder their effectiveness and even reduce overall performance. In this work, we address this challenge by introducing the Retrieval-Augmented Thought Process (RATP). Given access to external knowledge, RATP formulates the thought generation of LLMs as a multiple-step decision process. To optimise such a thought process, RATP leverages Monte-Carlo Tree Search and learns a proxy reward function that permits cost-efficient inference. On a private dataset of electronic medical records, deliberately excluded from any LLM training set, RATP achieves 35% additional accuracy compared to in-context retrieval-augmented generation for the question-answering task.
Paper Structure (32 sections, 5 equations, 16 figures, 13 tables, 5 algorithms)

This paper contains 32 sections, 5 equations, 16 figures, 13 tables, 5 algorithms.

Figures (16)

  • Figure 1: Retrieval-Augmented Thought Process overview.① The frozen LLM $l_{thought}$ given an answer $\hat{y}$ to the question $x$ by using the extra context $s_T$. ② The thought process starts from the question $x$ and outputs the best thought found $s_t$ to help answering $x$. The actions $\{a_i\}$ are decided by the MCTS with feedback from the scoring model. This component is detailed in Figure \ref{['fig:thought_process']}. ③ The information retrieval system interacts with the thought process by answering its queries with retrieved documents $\{I_i\}$.
  • Figure 2: Modeling the thought process. Each thought is generated from previous thoughts and/or documents, effectively creating a graph. The planning policy controlling the construction of this graph is detailed in Figure \ref{['fig:mcts_step']}.
  • Figure 3: One complete step from our MCTS decision process. It is divided into four functions, which are repeated until we find the answer or the thought process size limit is reached. The Selection, Expansion, Simulation, and Backpropagation functions are described in section \ref{['sec:mcts']}. Their associated algorithm can be found in Appendix \ref{['app:algs']}.
  • Figure 4: Evolution of the accuracy and the number of LLM queries. When we increase the thought process size (i.e. the number of thoughts generated), the accuracy increases but the number of LLM queries too.
  • Figure 5: Examples of Unstructured Electronic Medical Records. For privacy reasons, we present simulated EMRs resembling the actual dataset.
  • ...and 11 more figures