Table of Contents
Fetching ...

From Screens to Scenes: A Survey of Embodied AI in Healthcare

Yihao Liu, Xu Cao, Tingting Chen, Yankai Jiang, Junjie You, Minghua Wu, Xiaosong Wang, Mengling Feng, Yaochu Jin, Jintai Chen

TL;DR

This survey addresses a pressing problem in healthcare by examining how Embodied AI can interact with the physical world to improve diagnosis, treatment, and care delivery. It surveys foundational EmAI capabilities—multimodal perception, low-level actuation, high-level planning, and memory—and maps them to four healthcare domains: clinical intervention, daily care and companionship, infrastructure support, and biomedical research. The paper introduces an intelligent five-level framework, catalogs datasets and benchmarks, and discusses ethical, security, and interoperability challenges, offering a path toward safe, generalizable, and patient-centered EmAI systems. Its practical impact lies in guiding researchers and practitioners to build cohesive, end-to-end EmAI solutions that can augment clinicians, expand access, and accelerate medical innovation while addressing societal implications.

Abstract

Healthcare systems worldwide face persistent challenges in efficiency, accessibility, and personalization. Powered by modern AI technologies such as multimodal large language models and world models, Embodied AI (EmAI) represents a transformative frontier, offering enhanced autonomy and the ability to interact with the physical world to address these challenges. As an interdisciplinary and rapidly evolving research domain, "EmAI in healthcare" spans diverse fields such as algorithms, robotics, and biomedicine. This complexity underscores the importance of timely reviews and analyses to track advancements, address challenges, and foster cross-disciplinary collaboration. In this paper, we provide a comprehensive overview of the "brain" of EmAI for healthcare, wherein we introduce foundational AI algorithms for perception, actuation, planning, and memory, and focus on presenting the healthcare applications spanning clinical interventions, daily care & companionship, infrastructure support, and biomedical research. Despite its promise, the development of EmAI for healthcare is hindered by critical challenges such as safety concerns, gaps between simulation platforms and real-world applications, the absence of standardized benchmarks, and uneven progress across interdisciplinary domains. We discuss the technical barriers and explore ethical considerations, offering a forward-looking perspective on the future of EmAI in healthcare. A hierarchical framework of intelligent levels for EmAI systems is also introduced to guide further development. By providing systematic insights, this work aims to inspire innovation and practical applications, paving the way for a new era of intelligent, patient-centered healthcare.

From Screens to Scenes: A Survey of Embodied AI in Healthcare

TL;DR

This survey addresses a pressing problem in healthcare by examining how Embodied AI can interact with the physical world to improve diagnosis, treatment, and care delivery. It surveys foundational EmAI capabilities—multimodal perception, low-level actuation, high-level planning, and memory—and maps them to four healthcare domains: clinical intervention, daily care and companionship, infrastructure support, and biomedical research. The paper introduces an intelligent five-level framework, catalogs datasets and benchmarks, and discusses ethical, security, and interoperability challenges, offering a path toward safe, generalizable, and patient-centered EmAI systems. Its practical impact lies in guiding researchers and practitioners to build cohesive, end-to-end EmAI solutions that can augment clinicians, expand access, and accelerate medical innovation while addressing societal implications.

Abstract

Healthcare systems worldwide face persistent challenges in efficiency, accessibility, and personalization. Powered by modern AI technologies such as multimodal large language models and world models, Embodied AI (EmAI) represents a transformative frontier, offering enhanced autonomy and the ability to interact with the physical world to address these challenges. As an interdisciplinary and rapidly evolving research domain, "EmAI in healthcare" spans diverse fields such as algorithms, robotics, and biomedicine. This complexity underscores the importance of timely reviews and analyses to track advancements, address challenges, and foster cross-disciplinary collaboration. In this paper, we provide a comprehensive overview of the "brain" of EmAI for healthcare, wherein we introduce foundational AI algorithms for perception, actuation, planning, and memory, and focus on presenting the healthcare applications spanning clinical interventions, daily care & companionship, infrastructure support, and biomedical research. Despite its promise, the development of EmAI for healthcare is hindered by critical challenges such as safety concerns, gaps between simulation platforms and real-world applications, the absence of standardized benchmarks, and uneven progress across interdisciplinary domains. We discuss the technical barriers and explore ethical considerations, offering a forward-looking perspective on the future of EmAI in healthcare. A hierarchical framework of intelligent levels for EmAI systems is also introduced to guide further development. By providing systematic insights, this work aims to inspire innovation and practical applications, paving the way for a new era of intelligent, patient-centered healthcare.
Paper Structure (47 sections, 11 figures, 3 tables)

This paper contains 47 sections, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Overview of embodied AI in healthcare research. Top left: The proportion of healthcare-specific EmAI studies among all EmAI publications (as indexed by Google Scholar) from 2019 to 2024, reflecting the growing focus on healthcare within this field. Bottom left: Publication trends (2019–2024) for EmAI applications in healthcare. The growing research interest is illustrated across four key areas in healthcare: Biomedical Research, Infrastructure Support, Daily Care & Companionship, and Clinical Intervention, reflecting the increasing recognition of EmAI's potential to address diverse healthcare challenges. Right: A keyword co-occurrence network was generated using VOSviewerbukar2023method, based on Web of Science data spanning 2019 to 2024, showcasing core concepts and applications of EmAI in healthcare. Keywords extracted from article titles and abstracts are represented as nodes, with node size indicating frequency and links representing co-occurrence relationships. The network demonstrates a strong trend of interdisciplinary collaboration among fields such as computer science, engineering, and robotics to build holistic EmAI systems for the healthcare applications.
  • Figure 2: An overview and the function of embodied AI's "brain". (a) The human brain comprises specialized regions responsible for perception (e.g., parietal and occipital lobe), action control (e.g., cerebellum), decision-making (e.g., frontal lobe), and memory (e.g., hippocampus). (b) Similarly, the "brain" of embodied AI system is designed to emulate these functions, with interconnected modules for multi-modality perception, decision-making, action control, and memory. These components work synergistically, exchanging feedback and supporting adaptive behavior.
  • Figure 3: A detailed breakdown of EmAI core functionalities, with key approaches that support them.
  • Figure 4: Three main approaches of high-level planning, using surgical robots as examples. (a) Code-based LLM task planners select and execute pre-developed robotic skills (e.g., retraction, suturing, incision) in actuation. (b) Language-based LLM task planners generate detailed task-specific instructions (e.g., identify the incision, insert needle, tie knots) for dynamic planning and execution. (c) End-to-end embodied large models directly integrate planning and execution in a single model.
  • Figure 5: Embodied AI applications play critical roles in clinical intervention across the pre-intervention, in-intervention, and post-intervention phases. These systems enhance precision by performing essential functions such as diagnosis and planning, supporting surgical and therapeutic procedures, and facilitating recovery and health management. These EmAI systems can provide prior information or execute follow-up tasks for others, highlighting their potential for integrated development. By unifying these capabilities within a single embodiment, it is possible to create a comprehensive EmAI system for clinical intervention, with the capacity to significantly improve patient care across the entire clinical intervention spectrum.
  • ...and 6 more figures