IntelliCircos: A Data-driven and AI-powered Authoring Tool for Circos Plots
Mingyang Gu, Jiamin Zhu, Qipeng Wang, Fengjie Wang, Xiaolin Wen, Yong Wang, Min Zhu
TL;DR
Circos plots offer compact, multi-dimensional genomics visualization but are hard to design and implement. This work introduces IntelliCircos, an AI-powered interactive authoring tool that leverages a 4,396-plot circos dataset and a Retrieval-Augmented Generation pipeline to provide progressive design recommendations and a configuration reference, integrated into a five-module UI. By combining Similar Sample retrieval with a GPT-4 configuration generator, and representing configurations as a DAG for intuitive browsing, IntelliCircos enables natural-language design specification, automatic application, and easy refinement. A user study with eight bioinformatics analysts demonstrates enhanced usability and efficiency in circos-plot authoring, highlighting the value of human-AI collaboration and design-pattern inference for genomics visualization. Overall, IntelliCircos advances end-to-end circos plotting by coupling data-driven design guidance with real-time implementation, offering a practical platform for researchers to rapidly generate publication-ready circos visuals.
Abstract
Genomics data is essential in biological and medical domains, and bioinformatics analysts often manually create circos plots to analyze the data and extract valuable insights. However, creating circos plots is complex, as it requires careful design for multiple track attributes and positional relationships between them. Typically, analysts often seek inspiration from existing circos plots, and they have to iteratively adjust and refine the plot to achieve a satisfactory final design, making the process both tedious and time-intensive. To address these challenges, we propose IntelliCircos, an AI-powered interactive authoring tool that streamlines the process from initial visual design to the final implementation of circos plots. Specifically, we build a new dataset containing 4396 circos plots with corresponding annotations and configurations, which are extracted and labeled from published papers. With the dataset, we further identify track combination patterns, and utilize Large Language Model (LLM) to provide domain-specific design recommendations and configuration references to navigate the design of circos plots. We conduct a user study with 8 bioinformatics analysts to evaluate IntelliCircos, and the results demonstrate its usability and effectiveness in authoring circos plots.
