Rethinking Search: Making Domain Experts out of Dilettantes
Donald Metzler, Yi Tay, Dara Bahri, Marc Najork
TL;DR
The paper argues that information needs are best served by domain experts and that current IR and NLP approaches fall short due to grounding and provenance gaps in language models. It proposes a model-based information retrieval framework that replaces traditional indexing with a consolidated corpus model capable of retrieval, reasoning, and generation across multiple tasks, including domain-expert advice. Key ideas include corpus models that encode term-document and document-document relationships, multi-task learning, zero-/few-shot adaptation, and grounded, citation-backed responses that integrate multiple modalities and languages. If realized, this approach could yield scalable, interpretable, and verifiable domain-expert responses beyond today’s search and QA systems, with broad implications for cross-domain information access. The work also identifies substantial research challenges in training, grounding, scalability, and safety that will require coordinated advances across IR, NLP, and ML communities.
Abstract
When experiencing an information need, users want to engage with a domain expert, but often turn to an information retrieval system, such as a search engine, instead. Classical information retrieval systems do not answer information needs directly, but instead provide references to (hopefully authoritative) answers. Successful question answering systems offer a limited corpus created on-demand by human experts, which is neither timely nor scalable. Pre-trained language models, by contrast, are capable of directly generating prose that may be responsive to an information need, but at present they are dilettantes rather than domain experts -- they do not have a true understanding of the world, they are prone to hallucinating, and crucially they are incapable of justifying their utterances by referring to supporting documents in the corpus they were trained over. This paper examines how ideas from classical information retrieval and pre-trained language models can be synthesized and evolved into systems that truly deliver on the promise of domain expert advice.
