Emergent properties with repeated examples

François Charton; Julia Kempe

Emergent properties with repeated examples

François Charton, Julia Kempe

TL;DR

It is demonstrated that for a fixed number of training steps, models trained on smaller sets of repeated examples outperform models trained on larger sets of single-use examples and that two-set training provides for faster learning and better performance.

Abstract

We study the performance of transformers as a function of the number of repetitions of training examples with algorithmically generated datasets. On three problems of mathematics: the greatest common divisor, modular multiplication, and matrix eigenvalues, we show that for a fixed number of training steps, models trained on smaller sets of repeated examples outperform models trained on larger sets of single-use examples. We also demonstrate that two-set training - repeated use of a small random subset of examples, along normal sampling on the rest of the training set - provides for faster learning and better performance. This highlights that the benefits of repetition can outweigh those of data diversity. These datasets and problems provide a controlled setting to shed light on the still poorly understood interplay between generalization and memorization in deep learning.

Emergent properties with repeated examples

TL;DR

Abstract

Emergent properties with repeated examples

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)