Supporting Medicinal Chemists in Iterative Hypothesis Generation for Drug Target Identification
Youngseung Jeon, Christopher Hwang, Ziwen Li, Taylor Le Lievre, Jesus J. Campagna, Cohn Whitaker, Varghese John, Eunice Jun, Xiang Anthony Chen
TL;DR
The paper tackles inefficiencies in target-identification by introducing HAPPIER, an integrated AI-driven interface that unifies physical/functional interactions, therapeutic impact, and docking potential within a single PPI-graph. It combines retrieval-augmented generation and docking models to support divergent exploration and convergent validation, enabling iterative cycles of hypothesis generation. Empirical evidence from formative and user studies shows increased quantity and confidence of high-quality hypotheses when experts engage in iterative cycles, with design insights on information layout, domain knowledge integration, and human–AI collaboration. The work demonstrates practical potential to accelerate drug-target discovery and offers design principles for AI-enabled scientific tools across health domains.
Abstract
While drug discovery is vital for human health, the process remains inefficient. Medicinal chemists must navigate a vast protein space to identify target proteins that meet three criteria: physical and functional interactions, therapeutic impact, and docking potential. Prior approaches have provided fragmented support for each criterion, limiting the generation of promising hypotheses for wet-lab experiments. We present HAPPIER, an AI-powered tool that supports hypothesis generation with integrated multi-criteria support for target identification. HAPPIER enables medicinal chemists to 1) efficiently explore and verify proteins in a single integrated graph component showing multi-criteria satisfaction and 2) validate AI suggestions with domain knowledge. These capabilities facilitate iterative cycles of divergent and convergent thinking, essential for hypothesis generation. We evaluated HAPPIER with ten medicinal chemists, finding that it increased the number of high-confidence hypotheses and support for the iterative cycle, and further demonstrated the relationship between engaging in such cycles and confidence in outputs.
