Social Evolution of Published Text and The Emergence of Artificial Intelligence Through Large Language Models and The Problem of Toxicity and Bias

Arifa Khan; P. Saravanan; S. K Venkatesan

Social Evolution of Published Text and The Emergence of Artificial Intelligence Through Large Language Models and The Problem of Toxicity and Bias

Arifa Khan, P. Saravanan, S. K Venkatesan

TL;DR

This work surveys the social evolution of published text and the emergence of artificial intelligence through large language models, connecting historical text transmission to modern transformer-based NLP. It traces foundational ideas from Markovian text modeling and information theory to encoder–decoder architectures, sensational growth in model scale, and emergent capabilities via prompts and RLHF, while critically addressing persistent toxicity, bias, memorization, and reasoning failures. The paper highlights bias in word embeddings, data-source limitations, and governance approaches like Constitutional AI, arguing that scale alone cannot ensure reliability or fairness. It calls for careful evaluation, ethical considerations, and ongoing research to harness generative AI responsibly across society and multiple modalities.

Abstract

We provide a birds eye view of the rapid developments in AI and Deep Learning that has led to the path-breaking emergence of AI in Large Language Models. The aim of this study is to place all these developments in a pragmatic broader historical social perspective without any exaggerations while at the same time without any pessimism that created the AI winter in the 1970s to 1990s. We also at the same time point out toxicity, bias, memorization, sycophancy, logical inconsistencies, hallucinations that exist just as a warning to the overly optimistic. We note here that just as this emergence of AI seems to occur at a threshold point in the number of neural connections or weights, it has also been observed that human brain and especially the cortex region is nothing special or extraordinary but simply a case of scaled-up version of the primate brain and that even the human intelligence seems like an emergent phenomena of scale.

Social Evolution of Published Text and The Emergence of Artificial Intelligence Through Large Language Models and The Problem of Toxicity and Bias

TL;DR

Abstract

Paper Structure (17 sections, 3 equations, 4 figures, 6 tables)

This paper contains 17 sections, 3 equations, 4 figures, 6 tables.

Introduction
Markov chain, Shannon and the N-gram revolution
Chomsky's "colorless green ideas ..." and it's refutation by Norvig
Neural Networks -- RNN, LSTM and CNN
WordNet, Word embeddings and Transfer learning
Bias in Corpus-based Word embeddings
Attention, the big transformers, GPU and the LLM
Emergence of Artificial Intelligence by scale?
Toxicity and bias mitigation
Constitutional AI
Memorization, sycophancy, broken chains of logic and hallucinations
Memorization
Sycophancy
Broken chains of logic
Hallucination
...and 2 more sections

Figures (4)

Figure 1: Google N-gram View of "intelligent man" versus "intelligent woman"
Figure 2: Transformer Architecture
Figure 3: Size LLMs measured in terms of number of neural network weights (synaptic connections)
Figure 4: PaLM data model and its toxicity.

Social Evolution of Published Text and The Emergence of Artificial Intelligence Through Large Language Models and The Problem of Toxicity and Bias

TL;DR

Abstract

Social Evolution of Published Text and The Emergence of Artificial Intelligence Through Large Language Models and The Problem of Toxicity and Bias

Authors

TL;DR

Abstract

Table of Contents

Figures (4)