Table of Contents
Fetching ...

A Scaling Law for Synthetic-to-Real Transfer: How Much Is Your Pre-training Effective?

Hiroaki Mikami, Kenji Fukumizu, Shogo Murai, Shuji Suzuki, Yuta Kikuchi, Taiji Suzuki, Shin-ichi Maeda, Kohei Hayashi

TL;DR

The paper investigates how synthetic pre-training data translates to real-task performance, proposing a scalable law that predicts fine-tuning error from pre-training size. Grounded in neural tangent kernel theory, the law L(n,s)=\delta(\gamma+n^{-\alpha})s^{-\beta} (and its simplified form L(n,s)≈D n^{-\alpha}+C) captures two interacting effects: pre-training convergence (rate α) and a transfer gap (C) that sets a floor. The authors validate the law across multiple syn2real task pairs, model sizes, and data complexities, and provide a practical framework to decide whether to scale pre-training data or modify synthetic generation to reduce C. The study also shows larger models reduce the transfer gap, and that data complexity shapes both pre-training efficiency and transfer potential, offering guidance for synthetic data design and transfer learning strategies.

Abstract

Synthetic-to-real transfer learning is a framework in which a synthetically generated dataset is used to pre-train a model to improve its performance on real vision tasks. The most significant advantage of using synthetic images is that the ground-truth labels are automatically available, enabling unlimited expansion of the data size without human cost. However, synthetic data may have a huge domain gap, in which case increasing the data size does not improve the performance. How can we know that? In this study, we derive a simple scaling law that predicts the performance from the amount of pre-training data. By estimating the parameters of the law, we can judge whether we should increase the data or change the setting of image synthesis. Further, we analyze the theory of transfer learning by considering learning dynamics and confirm that the derived generalization bound is consistent with our empirical findings. We empirically validated our scaling law on various experimental settings of benchmark tasks, model sizes, and complexities of synthetic images.

A Scaling Law for Synthetic-to-Real Transfer: How Much Is Your Pre-training Effective?

TL;DR

The paper investigates how synthetic pre-training data translates to real-task performance, proposing a scalable law that predicts fine-tuning error from pre-training size. Grounded in neural tangent kernel theory, the law L(n,s)=\delta(\gamma+n^{-\alpha})s^{-\beta} (and its simplified form L(n,s)≈D n^{-\alpha}+C) captures two interacting effects: pre-training convergence (rate α) and a transfer gap (C) that sets a floor. The authors validate the law across multiple syn2real task pairs, model sizes, and data complexities, and provide a practical framework to decide whether to scale pre-training data or modify synthetic generation to reduce C. The study also shows larger models reduce the transfer gap, and that data complexity shapes both pre-training efficiency and transfer potential, offering guidance for synthetic data design and transfer learning strategies.

Abstract

Synthetic-to-real transfer learning is a framework in which a synthetically generated dataset is used to pre-train a model to improve its performance on real vision tasks. The most significant advantage of using synthetic images is that the ground-truth labels are automatically available, enabling unlimited expansion of the data size without human cost. However, synthetic data may have a huge domain gap, in which case increasing the data size does not improve the performance. How can we know that? In this study, we derive a simple scaling law that predicts the performance from the amount of pre-training data. By estimating the parameters of the law, we can judge whether we should increase the data or change the setting of image synthesis. Further, we analyze the theory of transfer learning by considering learning dynamics and confirm that the derived generalization bound is consistent with our empirical findings. We empirically validated our scaling law on various experimental settings of benchmark tasks, model sizes, and complexities of synthetic images.

Paper Structure

This paper contains 47 sections, 3 theorems, 60 equations, 14 figures, 1 table.

Key Result

Theorem 1

Let $\hat{f}_{n,s}(x)$ be a model of width $M$ pre-trained by $n$ samples ${(x_1, y_1), \dots, (x_n, y_n)}$ and fine-tuned by $s$ samples ${(x'_1, y'_1), \dots, (x'_s, y'_s)}$ where inputs $x,x'\sim p(x)$ are i.i.d. with the input distribution $p(x)$ and $y=\phi_0(x)$ and $y'=\varphi(x') = \phi_0(x' $\varepsilon_M$ and $c_M$ can be arbitrary small for large $M$; $A_0$ and $A_1$ are constants; the

Figures (14)

  • Figure 1: Empirical results of syn2real transfer for different tasks. We conducted four pre-training tasks: object detection (objdet), semantic segmentation (semseg), multi-label classification (mulclass), surface normal estimation (normal), and three fine-tuning tasks for benchmark datasets: object detection for MS-COCO, semantic segmentation for ADE20K, and single-label classification (sinclass) for ImageNet. The y-axis indicates the test error for each fine-tuning task. Dots indicate empirical results and dashed lines indicate the fitted curves of scaling law \ref{['eq:simple-law']}. For more details, see Section \ref{['sec:crosstask']}.
  • Figure 2: Scaling curves with different (a) fine-tuning size and (b) pre-training size.
  • Figure 3: Pre-training scenarios.
  • Figure 4: Effect of model size. Best viewed in color. Left: The scaling curves for mulclass$\to$sinclass and objdet$\to$objdet cases. The meanings of dots and lines are the same as those in Figure \ref{['fig:crosstask']}. Right: The estimated transfer gap $C$ (y-axis) versus the model size (x-axis) in log-log scale. The dots are estimated values, and the lines are linear fittings of them.
  • Figure 5: Results of million-scale pre-training. The models were pre-trained by objdet task and fine-tuned by objdet and sinclass tasks.
  • ...and 9 more figures

Theorems & Definitions (3)

  • Theorem 1: Informal
  • Theorem 2
  • Proposition 3