Table of Contents
Fetching ...

OpenRC: An Open-Source Robotic Colonoscopy Framework for Multimodal Data Acquisition and Autonomy Research

Siddhartha Kapuria, Mohammad Rafiee Javazm, Naruhiko Ikoma, Joga Ivatury, Mohammad Ali Nasseri, Nassir Navab, Farshid Alambeigi

Abstract

Colorectal cancer screening critically depends on colonoscopy, yet existing platforms offer limited support for systematically studying the coupled dynamics of operator control, instrument motion, and visual feedback. This gap restricts reproducible closed-loop research in robotic colonoscopy, medical imaging, and emerging vision-language-action (VLA) learning paradigms. To address this challenge, we present OpenRC, an open-source modular robotic colonoscopy framework that retrofits conventional scopes while preserving clinical workflow. The framework supports simultaneous recording of video, operator commands, actuation state, and distal tip pose. We experimentally validated motion consistency and quantified cross-modal latency across sensing streams. Using this platform, we collected a multimodal dataset comprising 1,894 teleoperated episodes ~19 hours across 10 structured task variations of routine navigation, failure events, and recovery behaviors. By unifying open hardware and an aligned multimodal dataset, OpenRC provides a reproducible foundation for research in multimodal robotic colonoscopy and surgical autonomy.

OpenRC: An Open-Source Robotic Colonoscopy Framework for Multimodal Data Acquisition and Autonomy Research

Abstract

Colorectal cancer screening critically depends on colonoscopy, yet existing platforms offer limited support for systematically studying the coupled dynamics of operator control, instrument motion, and visual feedback. This gap restricts reproducible closed-loop research in robotic colonoscopy, medical imaging, and emerging vision-language-action (VLA) learning paradigms. To address this challenge, we present OpenRC, an open-source modular robotic colonoscopy framework that retrofits conventional scopes while preserving clinical workflow. The framework supports simultaneous recording of video, operator commands, actuation state, and distal tip pose. We experimentally validated motion consistency and quantified cross-modal latency across sensing streams. Using this platform, we collected a multimodal dataset comprising 1,894 teleoperated episodes ~19 hours across 10 structured task variations of routine navigation, failure events, and recovery behaviors. By unifying open hardware and an aligned multimodal dataset, OpenRC provides a reproducible foundation for research in multimodal robotic colonoscopy and surgical autonomy.

Paper Structure

This paper contains 9 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Overview of the proposed OpenRC framework, including hardware interfaces, actuation, data acquisition, and sensing via EM tracking.
  • Figure 2: OpenRC components: (a) bending module, (b) feeding module, (c) ROS 2 graph showing data streams, and (d) experimental setup for data collection.
  • Figure 3: Results for characterization and synchronization showing: (a) sinusoidal system response characterization, and histograms of estimated residual lag distributions for (b) Operator Action vs State, and (c) State vs Tip Position.
  • Figure 4: Example episode with synchronized multimodal recordings from the robotic colonoscopy dataset. From top to bottom: colonoscope video snapshots sampled from the same time axis as data streams; operator control actions; distal tip pose; robot state (normalized to $[-1, 1]$ per axis for ease of visualization).
  • Figure 5: Episode-level characteristics of the OpenRC Dataset showing distributions of (a) episode duration, (b) trajectory length, and (c) recorded task. Detailed task descriptions will be provided in the dataset repository.