A Conceptual Framework for Human-AI Collaborative Genome Annotation
Xiaomei Li, Alex Whan, Meredith McNeil, David Starns, Jessica Irons, Samuel C. Andrew, Rad Suchecki
TL;DR
This paper addresses the limited scalability and accuracy of automated genome annotation (GA) when used in isolation, highlighting the crucial role of manual curation. It proposes HAICoGA, a conceptual framework that codifies a sustained human-AI collaborative workflow for GA, integrating seven key elements (humans, AI tools, data, goals, interfaces, environment, collaboration) and a bi-directional feedback loop to continuously improve annotations. Through a survey of LLM-based AI assistants in biology, the authors illustrate a path toward multi-agent systems that can manage GA tasks, with a vision for a manager-critiqued, hybrid workflow that includes specialized manual-curation agents. They also discuss significant challenges—architectural design, novel ML methods, multi-dimensional evaluation, and user-centered interfaces—providing a roadmap for developing scalable, accurate GA in real-world settings.
Abstract
Genome annotation is essential for understanding the functional elements within genomes. While automated methods are indispensable for processing large-scale genomic data, they often face challenges in accurately predicting gene structures and functions. Consequently, manual curation by domain experts remains crucial for validating and refining these predictions. These combined outcomes from automated tools and manual curation highlight the importance of integrating human expertise with AI capabilities to improve both the accuracy and efficiency of genome annotation. However, the manual curation process is inherently labor-intensive and time-consuming, making it difficult to scale for large datasets. To address these challenges, we propose a conceptual framework, Human-AI Collaborative Genome Annotation (HAICoGA), which leverages the synergistic partnership between humans and artificial intelligence to enhance human capabilities and accelerate the genome annotation process. Additionally, we explore the potential of integrating Large Language Models (LLMs) into this framework to support and augment specific tasks. Finally, we discuss emerging challenges and outline open research questions to guide further exploration in this area.
