Table of Contents
Fetching ...

MODOC: A Modular Interface for Flexible Interlinking of Text Retrieval and Text Generation Functions

Yingqiang Gao, Jhony Prada, Nianlong Gu, Jessica Lam, Richard H. R. Hahnloser

TL;DR

MODOC integrates retrieval and generation in a modular UI to enable trustworthy scientific writing with verifiable content. It comprises five modules and supports retrieval tasks (discovery, text alignment, keyphrase extraction) and generation tasks (citation and conclusion sentences), with explicit separation of truth-seeking and creative steps. The paper outlines structured workflows (Retrieve and Cite, Generate and Check) to promote ethical use and reduce confabulation. The platform aims to alleviate cognitive load while enabling real-time verification across millions of documents.

Abstract

Large Language Models (LLMs) produce eloquent texts but often the content they generate needs to be verified. Traditional information retrieval systems can assist with this task, but most systems have not been designed with LLM-generated queries in mind. As such, there is a compelling need for integrated systems that provide both retrieval and generation functionality within a single user interface. We present MODOC, a modular user interface that leverages the capabilities of LLMs and provides assistance with detecting their confabulations, promoting integrity in scientific writing. MODOC represents a significant step forward in scientific writing assistance. Its modular architecture supports flexible functions for retrieving information and for writing and generating text in a single, user-friendly interface.

MODOC: A Modular Interface for Flexible Interlinking of Text Retrieval and Text Generation Functions

TL;DR

MODOC integrates retrieval and generation in a modular UI to enable trustworthy scientific writing with verifiable content. It comprises five modules and supports retrieval tasks (discovery, text alignment, keyphrase extraction) and generation tasks (citation and conclusion sentences), with explicit separation of truth-seeking and creative steps. The paper outlines structured workflows (Retrieve and Cite, Generate and Check) to promote ethical use and reduce confabulation. The platform aims to alleviate cognitive load while enabling real-time verification across millions of documents.

Abstract

Large Language Models (LLMs) produce eloquent texts but often the content they generate needs to be verified. Traditional information retrieval systems can assist with this task, but most systems have not been designed with LLM-generated queries in mind. As such, there is a compelling need for integrated systems that provide both retrieval and generation functionality within a single user interface. We present MODOC, a modular user interface that leverages the capabilities of LLMs and provides assistance with detecting their confabulations, promoting integrity in scientific writing. MODOC represents a significant step forward in scientific writing assistance. Its modular architecture supports flexible functions for retrieving information and for writing and generating text in a single, user-friendly interface.
Paper Structure (30 sections, 18 figures, 1 table)

This paper contains 30 sections, 18 figures, 1 table.

Figures (18)

  • Figure 1: Interlinked Retrieval and Generation functions in our proposed platform Modoc. By setting the scope for input and output, the power of LLMs can be maximized for joint retrieval and generation.
  • Figure 2: Overview of Modoc. This figure demonstrates the basic workflow Retrieve and Cite (detailed description in section \ref{['section:retrieve&cite']}), where the user retrieves the most relevant research papers using keywords and the actual content of the manuscript. The modularity of Modoc ensures flexible configurations of many workflows not only for literature search, but also for verification of scientific claims and facts. The context within the Write module is taken from the work of zai2022goal.
  • Figure 3: Interaction between the retrieval and generation modules. Modoc allows flexible configuration of these modules for certain workflows.
  • Figure 4: Recall and Cite workflow: (a) Configuration of the Discovery function. The Discovery module takes as input the query from the Keywords module and the context (i.e. the claim) from the manuscript (Write module); (b) Required author actions to perform Recall and Cite in chronological order.
  • Figure 5: Discover and Cite workflow. The required author actions to perform this workflow are shown in chronological order.
  • ...and 13 more figures