Identifying Evidence-Based Nudges in Biomedical Literature with Large Language Models

Jaydeep Chauhan; Mark Seidman; Pezhman Raeisian Parvari; Zhi Zheng; Zina Ben-Miled; Cristina Barboi; Andrew Gonzalez; Malaz Boustani

Identifying Evidence-Based Nudges in Biomedical Literature with Large Language Models

Jaydeep Chauhan, Mark Seidman, Pezhman Raeisian Parvari, Zhi Zheng, Zina Ben-Miled, Cristina Barboi, Andrew Gonzalez, Malaz Boustani

TL;DR

This work tackles the problem of finding evidence-based behavioral nudges within a massive biomedical literature corpus. It introduces a two-stage AI pipeline that first uses hybrid filtering (keyword, TF-IDF, and semantic signals) to down-sample PubMed to about 81,000 articles, and then applies an LLM fine-tuned for scientific text to classify nudges and extract structured metadata in a single pass. The best-performing configuration achieves 72% recall and 0.67 F1, with a high-trust, precision-focused variant reaching 100% precision but only 12% recall, enabling tunable deployment depending on use-case needs. The resulting high-quality nudge corpus is being integrated into Agile Nudge+ to ground AI-generated recommendations in peer-reviewed evidence, enabling retrieval-augmented generation and transparent decision-support for health behavior interventions.

Abstract

We present a scalable, AI-powered system that identifies and extracts evidence-based behavioral nudges from unstructured biomedical literature. Nudges are subtle, non-coercive interventions that influence behavior without limiting choice, showing strong impact on health outcomes like medication adherence. However, identifying these interventions from PubMed's 8 million+ articles is a bottleneck. Our system uses a novel multi-stage pipeline: first, hybrid filtering (keywords, TF-IDF, cosine similarity, and a "nudge-term bonus") reduces the corpus to about 81,000 candidates. Second, we use OpenScholar (quantized LLaMA 3.1 8B) to classify papers and extract structured fields like nudge type and target behavior in a single pass, validated against a JSON schema. We evaluated four configurations on a labeled test set (N=197). The best setup (Title/Abstract/Intro) achieved a 67.0% F1 score and 72.0% recall, ideal for discovery. A high-precision variant using self-consistency (7 randomized passes) achieved 100% precision with 12% recall, demonstrating a tunable trade-off for high-trust use cases. This system is being integrated into Agile Nudge+, a real-world platform, to ground LLM-generated interventions in peer-reviewed evidence. This work demonstrates interpretable, domain-specific retrieval pipelines for evidence synthesis and personalized healthcare.

Identifying Evidence-Based Nudges in Biomedical Literature with Large Language Models

TL;DR

Abstract

Identifying Evidence-Based Nudges in Biomedical Literature with Large Language Models

Authors

TL;DR

Abstract

Table of Contents