Table of Contents
Fetching ...

Can ChatGPT-like Generative Models Guarantee Factual Accuracy? On the Mistakes of New Generation Search Engines

Ruochen Zhao, Xingxuan Li, Yew Ken Chia, Bosheng Ding, Lidong Bing

TL;DR

The paper investigates whether ChatGPT-like models can guarantee factual accuracy in search-enabled conversational systems. It analyzes public demonstrations of Microsoft's new Bing and Google's Bard, categorizing errors into conflicts with sources, non-existent facts, and missing citations, and compares transparency between the two. Findings reveal fabricated numbers, misattributed personal details, incorrect venue data, and other grounding failures in demonstrations, with Bing offering more source links but grounding still imperfect. The work emphasizes the need for verifiable grounding, explicit source transparency, and confidence reporting to build trust in AI-assisted search systems.

Abstract

Although large conversational AI models such as OpenAI's ChatGPT have demonstrated great potential, we question whether such models can guarantee factual accuracy. Recently, technology companies such as Microsoft and Google have announced new services which aim to combine search engines with conversational AI. However, we have found numerous mistakes in the public demonstrations that suggest we should not easily trust the factual claims of the AI models. Rather than criticizing specific models or companies, we hope to call on researchers and developers to improve AI models' transparency and factual correctness.

Can ChatGPT-like Generative Models Guarantee Factual Accuracy? On the Mistakes of New Generation Search Engines

TL;DR

The paper investigates whether ChatGPT-like models can guarantee factual accuracy in search-enabled conversational systems. It analyzes public demonstrations of Microsoft's new Bing and Google's Bard, categorizing errors into conflicts with sources, non-existent facts, and missing citations, and compares transparency between the two. Findings reveal fabricated numbers, misattributed personal details, incorrect venue data, and other grounding failures in demonstrations, with Bing offering more source links but grounding still imperfect. The work emphasizes the need for verifiable grounding, explicit source transparency, and confidence reporting to build trust in AI-assisted search systems.

Abstract

Although large conversational AI models such as OpenAI's ChatGPT have demonstrated great potential, we question whether such models can guarantee factual accuracy. Recently, technology companies such as Microsoft and Google have announced new services which aim to combine search engines with conversational AI. However, we have found numerous mistakes in the public demonstrations that suggest we should not easily trust the factual claims of the AI models. Rather than criticizing specific models or companies, we hope to call on researchers and developers to improve AI models' transparency and factual correctness.
Paper Structure (10 sections, 14 figures)

This paper contains 10 sections, 14 figures.

Figures (14)

  • Figure 1: Summary of the Gap Inc. fiscal report by the new Bing in press release.
  • Figure 2: Gap Inc. fiscal report excerpt on operating margins.
  • Figure 3: Gap Inc. fiscal report excerpt on operating margins.
  • Figure 4: Gap Inc. fiscal report on 2022 outlook.
  • Figure 5: The comparison table generated by the new Bing in press release.
  • ...and 9 more figures