Table of Contents
Fetching ...

Hunt Globally: Wide Search AI Agents for Drug Asset Scouting in Investing, Business Development, and Competitive Intelligence

Alisa Vinogradova, Vlad Vinogradov, Luba Greenwood, Ilya Yasny, Dmitry Kobyzev, Shoman Kasbekar, Kong Nguyen, Dmitrii Radkevich, Roman Doronin, Andrey Doronichev

TL;DR

In an era where most drug assets arise outside the US and are disclosed in regional sources, the paper tackles incomplete and hallucination-prone asset scouting by introducing a completeness-first benchmark and the Bioptic Agent, a tree-based, multilingual, self-learning system. The framework combines regional mining, attribute enrichment, query-conditioned search, and LLM-grounded validation to surface comprehensive asset sets with provenance. Bioptic outperforms state-of-the-art deep-research and find-all baselines on a held-out benchmark, achieving an F1 of 0.797 and demonstrating the practical value of constraint-driven, evidence-backed exploration for BD/S&E. The work highlights the importance of incomplete, under-the-radar asset discovery in global drug development and offers a scalable approach to reduce missed opportunities in partnerships and acquisitions.

Abstract

Bio-pharmaceutical innovation has shifted: many new drug assets now originate outside the United States and are disclosed primarily via regional, non-English channels. Recent data suggests that over 85% of patent filings originate outside the U.S., with China accounting for nearly half of the global total. A growing share of scholarly output is also non-U.S. Industry estimates put China at 30% of global drug development, spanning 1,200+ novel candidates. In this high-stakes environment, failing to surface "under-the-radar" assets creates multi-billion-dollar risk for investors and business development teams, making asset scouting a coverage-critical competition where speed and completeness drive value. Yet today's Deep Research AI agents still lag human experts in achieving high recall discovery across heterogeneous, multilingual sources without hallucination. We propose a benchmarking methodology for drug asset scouting and a tuned, tree-based self-learning Bioptic Agent aimed at complete, non-hallucinated scouting. We construct a challenging completeness benchmark using a multilingual multi-agent pipeline: complex user queries paired with ground-truth assets that are largely outside U.S.-centric radar. To reflect real-deal complexity, we collected screening queries from expert investors, BD, and VC professionals and used them as priors to conditionally generate benchmark queries. For grading, we use LLM-as-judge evaluation calibrated to expert opinions. On this benchmark, our Bioptic Agent achieves 79.7% F1 score, outperforming Claude Opus 4.6 (56.2%), Gemini 3 Pro + Deep Research (50.6%), OpenAI GPT-5.2 Pro (46.6%), Perplexity Deep Research (44.2%), and Exa Websets (26.9%). Performance improves steeply with additional compute, supporting the view that more compute yields better results.

Hunt Globally: Wide Search AI Agents for Drug Asset Scouting in Investing, Business Development, and Competitive Intelligence

TL;DR

In an era where most drug assets arise outside the US and are disclosed in regional sources, the paper tackles incomplete and hallucination-prone asset scouting by introducing a completeness-first benchmark and the Bioptic Agent, a tree-based, multilingual, self-learning system. The framework combines regional mining, attribute enrichment, query-conditioned search, and LLM-grounded validation to surface comprehensive asset sets with provenance. Bioptic outperforms state-of-the-art deep-research and find-all baselines on a held-out benchmark, achieving an F1 of 0.797 and demonstrating the practical value of constraint-driven, evidence-backed exploration for BD/S&E. The work highlights the importance of incomplete, under-the-radar asset discovery in global drug development and offers a scalable approach to reduce missed opportunities in partnerships and acquisitions.

Abstract

Bio-pharmaceutical innovation has shifted: many new drug assets now originate outside the United States and are disclosed primarily via regional, non-English channels. Recent data suggests that over 85% of patent filings originate outside the U.S., with China accounting for nearly half of the global total. A growing share of scholarly output is also non-U.S. Industry estimates put China at 30% of global drug development, spanning 1,200+ novel candidates. In this high-stakes environment, failing to surface "under-the-radar" assets creates multi-billion-dollar risk for investors and business development teams, making asset scouting a coverage-critical competition where speed and completeness drive value. Yet today's Deep Research AI agents still lag human experts in achieving high recall discovery across heterogeneous, multilingual sources without hallucination. We propose a benchmarking methodology for drug asset scouting and a tuned, tree-based self-learning Bioptic Agent aimed at complete, non-hallucinated scouting. We construct a challenging completeness benchmark using a multilingual multi-agent pipeline: complex user queries paired with ground-truth assets that are largely outside U.S.-centric radar. To reflect real-deal complexity, we collected screening queries from expert investors, BD, and VC professionals and used them as priors to conditionally generate benchmark queries. For grading, we use LLM-as-judge evaluation calibrated to expert opinions. On this benchmark, our Bioptic Agent achieves 79.7% F1 score, outperforming Claude Opus 4.6 (56.2%), Gemini 3 Pro + Deep Research (50.6%), OpenAI GPT-5.2 Pro (46.6%), Perplexity Deep Research (44.2%), and Exa Websets (26.9%). Performance improves steeply with additional compute, supporting the view that more compute yields better results.
Paper Structure (22 sections, 5 equations, 7 figures, 2 tables)

This paper contains 22 sections, 5 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Quality--time tradeoff for asset scouting.y-axis: F1-score (harmonic mean of precision and recall; higher is better). x-axis: wall-clock time (log scale; larger indicates longer compute). DR here stands for deep research; lang-free stands for no language parallelism.
  • Figure 2: Completeness Benchmark construction pipelineTop: Assets Mining the Regional News Miner Agent surfaces regional-stage drug assets from non-English sources; the Attributes Enrichment Agent validates and structures each asset; the Google Search Agent prioritizes under-the-radar assets via an English vs. origin-language discoverability check. Bottom: Query Generation real Investor Queries are clustered by intent and distilled by the Template Generator Agent into intent2templates; conditioned on these templates, the Query Generation Agent produces benchmark queries paired with ground-truth (GT) assets, and the Query Validator Agent along with human expert validators ensure each query--GT pair is valid and investor-realistic.
  • Figure 3: Distribution of asset origin language and therapeutic areas in the benchmark test split. Left: proportion of assets by origin language. Right: proportion of therapeutic-area labels assigned across assets.
  • Figure 4: Benchmark query composition.Left: distribution of queries across difficulty tiers (Broad, Tight, Complex/multi-hop). Right: prevalence of high-level constraint categories across queries (multi-label). A single query can trigger multiple categories; therefore, a category is counted once per query if the query contains that type of constraint.
  • Figure 5: Example prompt specifications (intent $\times$ template) used for benchmark query generation. Each query group is defined by an intent, difficulty tier, and a template. Here G$i$ denotes the $i$-th query group.
  • ...and 2 more figures