GENEVIC: GENetic data Exploration and Visualization via Intelligent interactive Console
Anindita Nath, Savannah Mwesigwa, Yulin Dai, Xiaoqian Jiang, Zhongming Zhao
TL;DR
GENEVIC presents an AI-assisted interactive console that bridges genetic data generation and biomedical knowledge discovery by fusing a PGS-centered ranking database with enrichment, network analysis, and literature search capabilities. Implemented as a Streamlit front-end backed by Azure OpenAI, it translates natural prompts into SQL and Python, enabling rapid exploration of prioritized variants, pathway context, and supporting literature across PubMed, Google Scholar, and arXiv. The system relies on a SQLite-based PGS rank database aggregated via the Dowdall method to produce an $MRR$-driven ranking, with ANNOVAR annotations and Enrichr/STRING integrations to enrich gene sets and networks. While demonstrated as a functional prototype with a limited dataset, GENEVIC demonstrates how domain-specific databases coupled with generative AI can streamline genetic research, democratize access to complex analyses, and pave the way for broader data sources, automated insights, and HIPAA-compliant deployment. The work highlights practical impact in prioritizing variants for complex diseases and offers a foundation for scalable, interactive genomics analytics tools.
Abstract
Summary: The vast generation of genetic data poses a significant challenge in efficiently uncovering valuable knowledge. Introducing GENEVIC, an AI-driven chat framework that tackles this challenge by bridging the gap between genetic data generation and biomedical knowledge discovery. Leveraging generative AI, notably ChatGPT, it serves as a biologist's 'copilot'. It automates the analysis, retrieval, and visualization of customized domain-specific genetic information, and integrates functionalities to generate protein interaction networks, enrich gene sets, and search scientific literature from PubMed, Google Scholar, and arXiv, making it a comprehensive tool for biomedical research. In its pilot phase, GENEVIC is assessed using a curated database that ranks genetic variants associated with Alzheimer's disease, schizophrenia, and cognition, based on their effect weights from the Polygenic Score Catalog, thus enabling researchers to prioritize genetic variants in complex diseases. GENEVIC's operation is user-friendly, accessible without any specialized training, secured by Azure OpenAI's HIPAA-compliant infrastructure, and evaluated for its efficacy through real-time query testing. As a prototype, GENEVIC is set to advance genetic research, enabling informed biomedical decisions. Availability and implementation: GENEVIC is publicly accessible at https://genevic-anath2024.streamlit.app. The underlying code is open-source and available via GitHub at https://github.com/anath2110/GENEVIC.git.
