Table of Contents
Fetching ...

Comparing diversity, negativity, and stereotypes in Chinese-language AI technologies: an investigation of Baidu, Ernie and Qwen

Geng Liu, Carlo Alberto Bono, Francesco Pierri

TL;DR

This study addresses biases and stereotypes in Chinese-language AI by comparing Baidu’s search auto-completions with two leading LLMs, Ernie and Qwen, across 240 Chinese social groups. It collects thousands of outputs, applies sentiment analysis, and measures diversity via Jaccard similarity, synonym expansion, and embedding-based cosine similarity. Key findings show that LLMs exhibit greater diversity of embedded views than Baidu, while Baidu and Qwen generate more negative content than Ernie, with a moderate presence of stereotypes in language models. The work underscores the need for fairness and inclusivity in global AI systems and provides a methodological framework for cross-platform bias assessment in Chinese-language technologies.

Abstract

Large Language Models (LLMs) and search engines have the potential to perpetuate biases and stereotypes by amplifying existing prejudices in their training data and algorithmic processes, thereby influencing public perception and decision-making. While most work has focused on Western-centric AI technologies, we study Chinese-based tools by investigating social biases embedded in the major Chinese search engine, Baidu, and two leading LLMs, Ernie and Qwen. Leveraging a dataset of 240 social groups across 13 categories describing Chinese society, we collect over 30k views encoded in the aforementioned tools by prompting them for candidate words describing such groups. We find that language models exhibit a larger variety of embedded views compared to the search engine, although Baidu and Qwen generate negative content more often than Ernie. We also find a moderate prevalence of stereotypes embedded in the language models, many of which potentially promote offensive and derogatory views. Our work highlights the importance of promoting fairness and inclusivity in AI technologies with a global perspective.

Comparing diversity, negativity, and stereotypes in Chinese-language AI technologies: an investigation of Baidu, Ernie and Qwen

TL;DR

This study addresses biases and stereotypes in Chinese-language AI by comparing Baidu’s search auto-completions with two leading LLMs, Ernie and Qwen, across 240 Chinese social groups. It collects thousands of outputs, applies sentiment analysis, and measures diversity via Jaccard similarity, synonym expansion, and embedding-based cosine similarity. Key findings show that LLMs exhibit greater diversity of embedded views than Baidu, while Baidu and Qwen generate more negative content than Ernie, with a moderate presence of stereotypes in language models. The work underscores the need for fairness and inclusivity in global AI systems and provides a methodological framework for cross-platform bias assessment in Chinese-language technologies.

Abstract

Large Language Models (LLMs) and search engines have the potential to perpetuate biases and stereotypes by amplifying existing prejudices in their training data and algorithmic processes, thereby influencing public perception and decision-making. While most work has focused on Western-centric AI technologies, we study Chinese-based tools by investigating social biases embedded in the major Chinese search engine, Baidu, and two leading LLMs, Ernie and Qwen. Leveraging a dataset of 240 social groups across 13 categories describing Chinese society, we collect over 30k views encoded in the aforementioned tools by prompting them for candidate words describing such groups. We find that language models exhibit a larger variety of embedded views compared to the search engine, although Baidu and Qwen generate negative content more often than Ernie. We also find a moderate prevalence of stereotypes embedded in the language models, many of which potentially promote offensive and derogatory views. Our work highlights the importance of promoting fairness and inclusivity in AI technologies with a global perspective.
Paper Structure (10 sections, 1 equation, 2 tables)