Table of Contents
Fetching ...

Continually Self-Improving Language Models for Bariatric Surgery Question--Answering

Yash Kumar Atri, Thomas H Shin, Thomas Hartvigsen

TL;DR

This work introduces bRAGgen, an adaptive retrieval-augmented generation framework for bariatric surgery question answering, which autonomously integrates up-to-date medical evidence when response confidence declines. Complementing it, bRAGq provides a large, expert-validated dataset of 1,302 postoperative bariatric questions to benchmark domain-specific QA. Through semantic caching, MD P-guided web retrieval, LoRA-enhanced generation, online learning, and safety constraints, bRAGgen outperforms state-of-the-art baselines across expert and LLM-as-Judge evaluations, demonstrating superior factuality, relevance, and comprehensiveness. The approach promises scalable, evidence-based, patient-centric support across preoperative to long-term postoperative bariatric care, with broader implications for continual learning in healthcare AI.

Abstract

While bariatric and metabolic surgery (MBS) is considered the gold standard treatment for severe and morbid obesity, its therapeutic efficacy hinges upon active and longitudinal engagement with multidisciplinary providers, including surgeons, dietitians/nutritionists, psychologists, and endocrinologists. This engagement spans the entire patient journey, from preoperative preparation to long-term postoperative management. However, this process is often hindered by numerous healthcare disparities, such as logistical and access barriers, which impair easy patient access to timely, evidence-based, clinician-endorsed information. To address these gaps, we introduce bRAGgen, a novel adaptive retrieval-augmented generation (RAG)-based model that autonomously integrates real-time medical evidence when response confidence dips below dynamic thresholds. This self-updating architecture ensures that responses remain current and accurate, reducing the risk of misinformation. Additionally, we present bRAGq, a curated dataset of 1,302 bariatric surgery--related questions, validated by an expert bariatric surgeon. bRAGq constitutes the first large-scale, domain-specific benchmark for comprehensive MBS care. In a two-phase evaluation, bRAGgen is benchmarked against state-of-the-art models using both large language model (LLM)--based metrics and expert surgeon review. Across all evaluation dimensions, bRAGgen demonstrates substantially superior performance in generating clinically accurate and relevant responses.

Continually Self-Improving Language Models for Bariatric Surgery Question--Answering

TL;DR

This work introduces bRAGgen, an adaptive retrieval-augmented generation framework for bariatric surgery question answering, which autonomously integrates up-to-date medical evidence when response confidence declines. Complementing it, bRAGq provides a large, expert-validated dataset of 1,302 postoperative bariatric questions to benchmark domain-specific QA. Through semantic caching, MD P-guided web retrieval, LoRA-enhanced generation, online learning, and safety constraints, bRAGgen outperforms state-of-the-art baselines across expert and LLM-as-Judge evaluations, demonstrating superior factuality, relevance, and comprehensiveness. The approach promises scalable, evidence-based, patient-centric support across preoperative to long-term postoperative bariatric care, with broader implications for continual learning in healthcare AI.

Abstract

While bariatric and metabolic surgery (MBS) is considered the gold standard treatment for severe and morbid obesity, its therapeutic efficacy hinges upon active and longitudinal engagement with multidisciplinary providers, including surgeons, dietitians/nutritionists, psychologists, and endocrinologists. This engagement spans the entire patient journey, from preoperative preparation to long-term postoperative management. However, this process is often hindered by numerous healthcare disparities, such as logistical and access barriers, which impair easy patient access to timely, evidence-based, clinician-endorsed information. To address these gaps, we introduce bRAGgen, a novel adaptive retrieval-augmented generation (RAG)-based model that autonomously integrates real-time medical evidence when response confidence dips below dynamic thresholds. This self-updating architecture ensures that responses remain current and accurate, reducing the risk of misinformation. Additionally, we present bRAGq, a curated dataset of 1,302 bariatric surgery--related questions, validated by an expert bariatric surgeon. bRAGq constitutes the first large-scale, domain-specific benchmark for comprehensive MBS care. In a two-phase evaluation, bRAGgen is benchmarked against state-of-the-art models using both large language model (LLM)--based metrics and expert surgeon review. Across all evaluation dimensions, bRAGgen demonstrates substantially superior performance in generating clinically accurate and relevant responses.

Paper Structure

This paper contains 25 sections, 12 equations, 2 figures, 10 tables.

Figures (2)

  • Figure 1: Architecture of the proposed method bRAGgen, The system integrates large language models (eg. Llama3) with real-time web retrieval capabilities. When confidence falls below the threshold ($\alpha$), the system automatically retrieves updated information from authoritative medical sources to enhance response accuracy.
  • Figure 2: Exploratory Analysis of Model Editing Dynamics. (a) Distribution of changes in confidence scores post-edit, showing that most changes are modest and positive. (b) Frequency of search queries across external biomedical domains, with PubMed dominating. (c) Training loss progression across iterations, illustrating convergence patterns and volatility. (d) Distribution of total duration taken for each edit operation, highlighting that most edits are executed within 10-20 seconds.