Table of Contents
Fetching ...

Cognitive Training for Language Models: Towards General Capabilities via Cross-Entropy Games

Clément Hongler, Franck Gabriel, Valentin Hartmann, Arthur Renard, Andrew Emil

Abstract

Defining a constructive process to build general capabilities for language models in an automatic manner is considered an open problem in artificial intelligence. Towards this, we consider the problem of building a curriculum of tasks that grows a model via relevant skill discovery. We provide a concrete framework for this task, using a family of tasks called cross-entropy games, which we postulate is universal in a suitable sense. We show that if it is possible to grow the curriculum for relevant skill discovery by iterating a greedy optimization algorithm, then, under natural assumptions, there is essentially only one meta-objective possible (up to a few hyperparameters). We call the resulting process cognitive training. We postulate that, given sufficiently capable language models as players and meta-samplers and sufficient training time, cognitive training provides a principled way to relevant skill discovery; and hence to the extent general capabilities are achievable via greedy curriculum learning, cognitive training would be a solution.

Cognitive Training for Language Models: Towards General Capabilities via Cross-Entropy Games

Abstract

Defining a constructive process to build general capabilities for language models in an automatic manner is considered an open problem in artificial intelligence. Towards this, we consider the problem of building a curriculum of tasks that grows a model via relevant skill discovery. We provide a concrete framework for this task, using a family of tasks called cross-entropy games, which we postulate is universal in a suitable sense. We show that if it is possible to grow the curriculum for relevant skill discovery by iterating a greedy optimization algorithm, then, under natural assumptions, there is essentially only one meta-objective possible (up to a few hyperparameters). We call the resulting process cognitive training. We postulate that, given sufficiently capable language models as players and meta-samplers and sufficient training time, cognitive training provides a principled way to relevant skill discovery; and hence to the extent general capabilities are achievable via greedy curriculum learning, cognitive training would be a solution.
Paper Structure (55 sections, 1 theorem, 17 equations)

This paper contains 55 sections, 1 theorem, 17 equations.

Key Result

Theorem 1

A consistent internal meta-objective (according to the principles outlined above) $\mathcal{I}$ must be of the form where where $\delta\in\left[0,1\right]$ is a diversity hyper-parameter.

Theorems & Definitions (27)

  • Claim 3
  • Claim 4
  • Example 5: Inverse prompting and summarization
  • Example 6: Common explanations
  • Remark 7
  • Remark 8
  • Claim 9
  • Remark 10
  • Claim 11
  • Definition 12
  • ...and 17 more