Table of Contents
Fetching ...

Convergence of Outputs When Two Large Language Models Interact in a Multi-Agentic Setup

Aniruddha Maiti, Satya Nimmagadda, Kartha Veerya Jammuladinne, Niladri Sengupta, Ananya Jana

TL;DR

<3-5 sentence high-level summary> This study investigates how two independently trained large language systems interact by conversing over many turns without external prompts. Using Mistral Nemo Base 2407 and Llama 2 13B hf, the authors run 25-turn exchanges with seed sentences to observe dynamics. They find that initial coherent dialogue often collapses into low-diversity repetition, with a substantial fraction converging across several metrics. The work highlights stability limits in open multi-agent dialogue and suggests direction for interventions to maintain novelty.

Abstract

In this work, we report what happens when two large language models respond to each other for many turns without any outside input in a multi-agent setup. The setup begins with a short seed sentence. After that, each model reads the other's output and generates a response. This continues for a fixed number of steps. We used Mistral Nemo Base 2407 and Llama 2 13B hf. We observed that most conversations start coherently but later fall into repetition. In many runs, a short phrase appears and repeats across turns. Once repetition begins, both models tend to produce similar output rather than introducing a new direction in the conversation. This leads to a loop where the same or similar text is produced repeatedly. We describe this behavior as a form of convergence. It occurs even though the models are large, trained separately, and not given any prompt instructions. To study this behavior, we apply lexical and embedding-based metrics to measure how far the conversation drifts from the initial seed and how similar the outputs of the two models becomes as the conversation progresses.

Convergence of Outputs When Two Large Language Models Interact in a Multi-Agentic Setup

TL;DR

<3-5 sentence high-level summary> This study investigates how two independently trained large language systems interact by conversing over many turns without external prompts. Using Mistral Nemo Base 2407 and Llama 2 13B hf, the authors run 25-turn exchanges with seed sentences to observe dynamics. They find that initial coherent dialogue often collapses into low-diversity repetition, with a substantial fraction converging across several metrics. The work highlights stability limits in open multi-agent dialogue and suggests direction for interventions to maintain novelty.

Abstract

In this work, we report what happens when two large language models respond to each other for many turns without any outside input in a multi-agent setup. The setup begins with a short seed sentence. After that, each model reads the other's output and generates a response. This continues for a fixed number of steps. We used Mistral Nemo Base 2407 and Llama 2 13B hf. We observed that most conversations start coherently but later fall into repetition. In many runs, a short phrase appears and repeats across turns. Once repetition begins, both models tend to produce similar output rather than introducing a new direction in the conversation. This leads to a loop where the same or similar text is produced repeatedly. We describe this behavior as a form of convergence. It occurs even though the models are large, trained separately, and not given any prompt instructions. To study this behavior, we apply lexical and embedding-based metrics to measure how far the conversation drifts from the initial seed and how similar the outputs of the two models becomes as the conversation progresses.

Paper Structure

This paper contains 17 sections, 4 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Experimental pipeline showing how two large language models interact over multiple turns. Each model alternately reads the other’s output and generates a new response, continuing until the defined number of steps is reached.
  • Figure 2: Example of convergence of the outputs of two models
  • Figure 3: Left: Cosine Distance; Right: Jaccard Distance
  • Figure 4: Left: BLEU score-based distance; Right: Coherence
  • Figure 5: t-SNE projection of sentence embeddings for Rounds 1–30, grouped in intervals of 5. Each point is a model output.
  • ...and 3 more figures