Table of Contents
Fetching ...

IntelliCircos: A Data-driven and AI-powered Authoring Tool for Circos Plots

Mingyang Gu, Jiamin Zhu, Qipeng Wang, Fengjie Wang, Xiaolin Wen, Yong Wang, Min Zhu

TL;DR

Circos plots offer compact, multi-dimensional genomics visualization but are hard to design and implement. This work introduces IntelliCircos, an AI-powered interactive authoring tool that leverages a 4,396-plot circos dataset and a Retrieval-Augmented Generation pipeline to provide progressive design recommendations and a configuration reference, integrated into a five-module UI. By combining Similar Sample retrieval with a GPT-4 configuration generator, and representing configurations as a DAG for intuitive browsing, IntelliCircos enables natural-language design specification, automatic application, and easy refinement. A user study with eight bioinformatics analysts demonstrates enhanced usability and efficiency in circos-plot authoring, highlighting the value of human-AI collaboration and design-pattern inference for genomics visualization. Overall, IntelliCircos advances end-to-end circos plotting by coupling data-driven design guidance with real-time implementation, offering a practical platform for researchers to rapidly generate publication-ready circos visuals.

Abstract

Genomics data is essential in biological and medical domains, and bioinformatics analysts often manually create circos plots to analyze the data and extract valuable insights. However, creating circos plots is complex, as it requires careful design for multiple track attributes and positional relationships between them. Typically, analysts often seek inspiration from existing circos plots, and they have to iteratively adjust and refine the plot to achieve a satisfactory final design, making the process both tedious and time-intensive. To address these challenges, we propose IntelliCircos, an AI-powered interactive authoring tool that streamlines the process from initial visual design to the final implementation of circos plots. Specifically, we build a new dataset containing 4396 circos plots with corresponding annotations and configurations, which are extracted and labeled from published papers. With the dataset, we further identify track combination patterns, and utilize Large Language Model (LLM) to provide domain-specific design recommendations and configuration references to navigate the design of circos plots. We conduct a user study with 8 bioinformatics analysts to evaluate IntelliCircos, and the results demonstrate its usability and effectiveness in authoring circos plots.

IntelliCircos: A Data-driven and AI-powered Authoring Tool for Circos Plots

TL;DR

Circos plots offer compact, multi-dimensional genomics visualization but are hard to design and implement. This work introduces IntelliCircos, an AI-powered interactive authoring tool that leverages a 4,396-plot circos dataset and a Retrieval-Augmented Generation pipeline to provide progressive design recommendations and a configuration reference, integrated into a five-module UI. By combining Similar Sample retrieval with a GPT-4 configuration generator, and representing configurations as a DAG for intuitive browsing, IntelliCircos enables natural-language design specification, automatic application, and easy refinement. A user study with eight bioinformatics analysts demonstrates enhanced usability and efficiency in circos-plot authoring, highlighting the value of human-AI collaboration and design-pattern inference for genomics visualization. Overall, IntelliCircos advances end-to-end circos plotting by coupling data-driven design guidance with real-time implementation, offering a practical platform for researchers to rapidly generate publication-ready circos visuals.

Abstract

Genomics data is essential in biological and medical domains, and bioinformatics analysts often manually create circos plots to analyze the data and extract valuable insights. However, creating circos plots is complex, as it requires careful design for multiple track attributes and positional relationships between them. Typically, analysts often seek inspiration from existing circos plots, and they have to iteratively adjust and refine the plot to achieve a satisfactory final design, making the process both tedious and time-intensive. To address these challenges, we propose IntelliCircos, an AI-powered interactive authoring tool that streamlines the process from initial visual design to the final implementation of circos plots. Specifically, we build a new dataset containing 4396 circos plots with corresponding annotations and configurations, which are extracted and labeled from published papers. With the dataset, we further identify track combination patterns, and utilize Large Language Model (LLM) to provide domain-specific design recommendations and configuration references to navigate the design of circos plots. We conduct a user study with 8 bioinformatics analysts to evaluate IntelliCircos, and the results demonstrate its usability and effectiveness in authoring circos plots.

Paper Structure

This paper contains 20 sections, 2 equations, 7 figures.

Figures (7)

  • Figure 1: (A) Structure of a Circos Plot. A circos plot consists of multiple nested rings surrounding a genomics axis, with each ring containing one or more tracks. (B) Example of a Circos Plot. The tracks (B1-B8) are arranged from outermost to innermost.
  • Figure 2: Label syntax of a circos plot configuration. A CIRCOS begins with the <start> terminator and ends with CIRCOS_END, containing one or multiple RING elements. A RING consists of TRACK elements, and RINGs are separated by the <split> terminator, ordered from outside to inside. A TRACK represents a single track within a circos plot. We classify text tracks, scale tracks, and other non-visual tracks under <others>, as our focus is on the design of visual elements.
  • Figure 3: Track Combination Analysis Results: (A) The number of rings in each circos plot, (B) The number of track types in each circos plot, (C) The number of tracks in each ring, (D) The number of tracks of each type, (E) The conditional probability of stacked relationship, and (F) The conditional probability of synthesized relationship.
  • Figure 4: Recommendation Workflow. For the user query, the tool encodes it into a vector using LLaMa and searches for semantically similar samples in the database. The retrieved samples, user query, and the results from the track combination analysis in Section 4, are combined to form the prompt for GPT-4, which generates the recommendation results.
  • Figure 5: IntelliCircos is an interactive AI-powered authoring tool designed to facilitate the creation of circos plot. Its user interface comprises five interconnected components: Recom Edit Panel (A) generates design recommendations based on user needs in natural language, Circos Dashboard (B) renders the circos plot created by users, Configuration Panel (C) offers easy refinement of track configurations, Reference Panel (D) enables users to browse and analyze configurations of similar circos plots, Data Panel (E) allows users to manage the source data.
  • ...and 2 more figures