Automated Genomic Interpretation via Concept Bottleneck Models for Medical Robotics
Zijun Li, Jinchang Zhang, Ming Zhang, Guoyu Lu
TL;DR
The paper tackles the challenge of translating genomic sequences into auditable, action-oriented decisions for medical robotics. It presents an end-to-end pipeline that maps DNA to CGR images, encodes them with a CNN, and enforces predictions through an interpretable concept bottleneck, augmented by priors, distribution alignment, and calibration. A cost-aware recommendation layer closes the loop from genomic interpretation to robotic decision policies, yielding interpretable evidence, improved calibration, and favorable decision economics. The approach achieves state-of-the-art classification on HIV gag datasets, demonstrates strong concept fidelity (GC, CpG, CCC) with high correlations and AUROCs, and shows robust faithfulness and decision-layer performance. Collectively, this work establishes a practical, auditable framework for integrating genomic interpretation into automated medical robotics and genomic medicine workflows.
Abstract
We propose an automated genomic interpretation module that transforms raw DNA sequences into actionable, interpretable decisions suitable for integration into medical automation and robotic systems. Our framework combines Chaos Game Representation (CGR) with a Concept Bottleneck Model (CBM), enforcing predictions to flow through biologically meaningful concepts such as GC content, CpG density, and k mer motifs. To enhance reliability, we incorporate concept fidelity supervision, prior consistency alignment, KL distribution matching, and uncertainty calibration. Beyond accurate classification of HIV subtypes across both in-house and LANL datasets, our module delivers interpretable evidence that can be directly validated against biological priors. A cost aware recommendation layer further translates predictive outputs into decision policies that balance accuracy, calibration, and clinical utility, reducing unnecessary retests and improving efficiency. Extensive experiments demonstrate that the proposed system achieves state of the art classification performance, superior concept prediction fidelity, and more favorable cost benefit trade-offs compared to existing baselines. By bridging the gap between interpretable genomic modeling and automated decision-making, this work establishes a reliable foundation for robotic and clinical automation in genomic medicine.
