Can ChatGPT-like Generative Models Guarantee Factual Accuracy? On the Mistakes of New Generation Search Engines

Ruochen Zhao; Xingxuan Li; Yew Ken Chia; Bosheng Ding; Lidong Bing

Can ChatGPT-like Generative Models Guarantee Factual Accuracy? On the Mistakes of New Generation Search Engines

Ruochen Zhao, Xingxuan Li, Yew Ken Chia, Bosheng Ding, Lidong Bing

TL;DR

The paper investigates whether ChatGPT-like models can guarantee factual accuracy in search-enabled conversational systems. It analyzes public demonstrations of Microsoft's new Bing and Google's Bard, categorizing errors into conflicts with sources, non-existent facts, and missing citations, and compares transparency between the two. Findings reveal fabricated numbers, misattributed personal details, incorrect venue data, and other grounding failures in demonstrations, with Bing offering more source links but grounding still imperfect. The work emphasizes the need for verifiable grounding, explicit source transparency, and confidence reporting to build trust in AI-assisted search systems.

Abstract

Although large conversational AI models such as OpenAI's ChatGPT have demonstrated great potential, we question whether such models can guarantee factual accuracy. Recently, technology companies such as Microsoft and Google have announced new services which aim to combine search engines with conversational AI. However, we have found numerous mistakes in the public demonstrations that suggest we should not easily trust the factual claims of the AI models. Rather than criticizing specific models or companies, we hope to call on researchers and developers to improve AI models' transparency and factual correctness.

Can ChatGPT-like Generative Models Guarantee Factual Accuracy? On the Mistakes of New Generation Search Engines

TL;DR

Abstract

Paper Structure (10 sections, 14 figures)

This paper contains 10 sections, 14 figures.

Introduction
What Factual Mistakes did Microsoft's New Bing Demonstrate?
Fabricated numbers in the summary of financial reports: be careful when you trust the new Bing!
Top Japanese poet: secretly a rock singer?
Following Bing's nightclub recommendations? You could be facing a closed door.
Potential concerns in the limited new Bing demo.
What Factual Mistakes did Google's Bard Demonstrate?
How do Bing and Bard Compare?
How Can the Factual Limitations Be Addressed?
Conclusions

Figures (14)

Figure 1: Summary of the Gap Inc. fiscal report by the new Bing in press release.
Figure 2: Gap Inc. fiscal report excerpt on operating margins.
Figure 3: Gap Inc. fiscal report excerpt on operating margins.
Figure 4: Gap Inc. fiscal report on 2022 outlook.
Figure 5: The comparison table generated by the new Bing in press release.
...and 9 more figures

Can ChatGPT-like Generative Models Guarantee Factual Accuracy? On the Mistakes of New Generation Search Engines

TL;DR

Abstract

Can ChatGPT-like Generative Models Guarantee Factual Accuracy? On the Mistakes of New Generation Search Engines

Authors

TL;DR

Abstract

Table of Contents

Figures (14)