Table of Contents
Fetching ...

Data-Centric Visual Development for Self-Driving Labs

Anbang Liu, Guanzhong Hu, Jiayi Wang, Ping Guo, Han Liu

TL;DR

This work tackles data scarcity for rare visual events in self-driving laboratories by introducing a bi-track data engine that fuses real, event-triggered capture with reference-conditioned, prompt-steered synthetic generation. The real track uses a fixed camera, quality gating, and lightweight prescreening to produce reliable labeled data with minimal human effort; the virtual track augments this with synthesized images anchored to lab conditions. Training on a unified, class-balanced dataset yields 99.6% accuracy on held-out real data with real-only training and 99.4% accuracy when mixed with generated data while reducing data collection costs. The approach demonstrates a scalable, data-centric pathway to enable robust vision feedback in SDL workflows and generalizes to other rare-event perception tasks in scientific imaging.

Abstract

Self-driving laboratories offer a promising path toward reducing the labor-intensive, time-consuming, and often irreproducible workflows in the biological sciences. Yet their stringent precision requirements demand highly robust models whose training relies on large amounts of annotated data. However, this kind of data is difficult to obtain in routine practice, especially negative samples. In this work, we focus on pipetting, the most critical and precision sensitive action in SDLs. To overcome the scarcity of training data, we build a hybrid pipeline that fuses real and virtual data generation. The real track adopts a human-in-the-loop scheme that couples automated acquisition with selective human verification to maximize accuracy with minimal effort. The virtual track augments the real data using reference-conditioned, prompt-guided image generation, which is further screened and validated for reliability. Together, these two tracks yield a class-balanced dataset that enables robust bubble detection training. On a held-out real test set, a model trained entirely on automatically acquired real images reaches 99.6% accuracy, and mixing real and generated data during training sustains 99.4% accuracy while reducing collection and review load. Our approach offers a scalable and cost-effective strategy for supplying visual feedback data to SDL workflows and provides a practical solution to data scarcity in rare event detection and broader vision tasks.

Data-Centric Visual Development for Self-Driving Labs

TL;DR

This work tackles data scarcity for rare visual events in self-driving laboratories by introducing a bi-track data engine that fuses real, event-triggered capture with reference-conditioned, prompt-steered synthetic generation. The real track uses a fixed camera, quality gating, and lightweight prescreening to produce reliable labeled data with minimal human effort; the virtual track augments this with synthesized images anchored to lab conditions. Training on a unified, class-balanced dataset yields 99.6% accuracy on held-out real data with real-only training and 99.4% accuracy when mixed with generated data while reducing data collection costs. The approach demonstrates a scalable, data-centric pathway to enable robust vision feedback in SDL workflows and generalizes to other rare-event perception tasks in scientific imaging.

Abstract

Self-driving laboratories offer a promising path toward reducing the labor-intensive, time-consuming, and often irreproducible workflows in the biological sciences. Yet their stringent precision requirements demand highly robust models whose training relies on large amounts of annotated data. However, this kind of data is difficult to obtain in routine practice, especially negative samples. In this work, we focus on pipetting, the most critical and precision sensitive action in SDLs. To overcome the scarcity of training data, we build a hybrid pipeline that fuses real and virtual data generation. The real track adopts a human-in-the-loop scheme that couples automated acquisition with selective human verification to maximize accuracy with minimal effort. The virtual track augments the real data using reference-conditioned, prompt-guided image generation, which is further screened and validated for reliability. Together, these two tracks yield a class-balanced dataset that enables robust bubble detection training. On a held-out real test set, a model trained entirely on automatically acquired real images reaches 99.6% accuracy, and mixing real and generated data during training sustains 99.4% accuracy while reducing collection and review load. Our approach offers a scalable and cost-effective strategy for supplying visual feedback data to SDL workflows and provides a practical solution to data scarcity in rare event detection and broader vision tasks.

Paper Structure

This paper contains 15 sections, 12 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Real track workflow. After each aspiration, the robot holds the pipette tip at a fixed inspection place and the camera takes a photo. A quick quality check removes bad frames (e.g., off-center, or missing the tip). The remaining frames are screened by a lightweight classifier: good photos are accepted automatically, borderline cases are sent to a brief human review, and only poor-quality frames are discarded. Both bubble and no-bubble images are kept, so the process yields a steady, labeled stream of high-quality real data with minimal supervision.
  • Figure 2: Virtual track workflow. Starting from a real reference tip image, we programmatically build prompts that fix viewpoint and background but vary lab factors (color, level, bubble count/size/distribution) and specify the intended class (bubble vs. no-bubble). We batch-generate variations, run the same quality gate as in the real track, enforce label consistency with the current classifier, and perform light human spot-checks. Both bubble and no-bubble images that pass are standardized to $600{\times}1500$ and added to the synthetic set for mixed training.
  • Figure 3: Dataset examples.
  • Figure 4: ABLE Labs Notable liquid-handling robot and our fixed-camera setup from three viewpoints: (a) left eye level, (b) top-down, and (c) right eye level.