Table of Contents
Fetching ...

Evolutionary Retrofitting

Mathurin Videau, Mariia Zameshina, Alessandro Leite, Laurent Najman, Marc Schoenauer, Olivier Teytaud

TL;DR

AfterLearnER introduces a gradient-free retrofitting framework that post-hoc tunes a small set of model parameters, the $\aleph$-parameters, using non-differentiable feedback from a validation subset. It operates in offline and online modes and leverages black-box optimizers (e.g., Nevergrad's NGOpt) to minimize arbitrary $\aleph$-losses with only dozens to hundreds of scalar signals, avoiding gradient backpropagation. Theoretical analysis shows bounded overfitting risk with multiple independent runs and parallelism, while empirical results across depth sensing, speech synthesis, Doom RL, code translation, 3D GANs, and LDMs demonstrate robust, low-budget improvements over strong baselines. The approach positions itself between HPO, test-time adaptation, and RLHF, offering a versatile, training-agnostic method to align outputs with non-differentiable or human-centric objectives in a practical, anytime fashion.

Abstract

AfterLearnER (After Learning Evolutionary Retrofitting) consists in applying evolutionary optimization to refine fully trained machine learning models by optimizing a set of carefully chosen parameters or hyperparameters of the model, with respect to some actual, exact, and hence possibly non-differentiable error signal, performed on a subset of the standard validation set. The efficiency of AfterLearnER is demonstrated by tackling non-differentiable signals such as threshold-based criteria in depth sensing, the word error rate in speech re-synthesis, the number of kills per life at Doom, computational accuracy or BLEU in code translation, image quality in 3D generative adversarial networks (GANs), and user feedback in image generation via Latent Diffusion Models (LDM). This retrofitting can be done after training, or dynamically at inference time by taking into account the user feedback. The advantages of AfterLearnER are its versatility, the possibility to use non-differentiable feedback, including human evaluations (i.e., no gradient is needed), the limited overfitting supported by a theoretical study, and its anytime behavior. Last but not least, AfterLearnER requires only a small amount of feedback, i.e., a few dozen to a few hundred scalars, compared to the tens of thousands needed in most related published works.

Evolutionary Retrofitting

TL;DR

AfterLearnER introduces a gradient-free retrofitting framework that post-hoc tunes a small set of model parameters, the -parameters, using non-differentiable feedback from a validation subset. It operates in offline and online modes and leverages black-box optimizers (e.g., Nevergrad's NGOpt) to minimize arbitrary -losses with only dozens to hundreds of scalar signals, avoiding gradient backpropagation. Theoretical analysis shows bounded overfitting risk with multiple independent runs and parallelism, while empirical results across depth sensing, speech synthesis, Doom RL, code translation, 3D GANs, and LDMs demonstrate robust, low-budget improvements over strong baselines. The approach positions itself between HPO, test-time adaptation, and RLHF, offering a versatile, training-agnostic method to align outputs with non-differentiable or human-centric objectives in a practical, anytime fashion.

Abstract

AfterLearnER (After Learning Evolutionary Retrofitting) consists in applying evolutionary optimization to refine fully trained machine learning models by optimizing a set of carefully chosen parameters or hyperparameters of the model, with respect to some actual, exact, and hence possibly non-differentiable error signal, performed on a subset of the standard validation set. The efficiency of AfterLearnER is demonstrated by tackling non-differentiable signals such as threshold-based criteria in depth sensing, the word error rate in speech re-synthesis, the number of kills per life at Doom, computational accuracy or BLEU in code translation, image quality in 3D generative adversarial networks (GANs), and user feedback in image generation via Latent Diffusion Models (LDM). This retrofitting can be done after training, or dynamically at inference time by taking into account the user feedback. The advantages of AfterLearnER are its versatility, the possibility to use non-differentiable feedback, including human evaluations (i.e., no gradient is needed), the limited overfitting supported by a theoretical study, and its anytime behavior. Last but not least, AfterLearnER requires only a small amount of feedback, i.e., a few dozen to a few hundred scalars, compared to the tens of thousands needed in most related published works.

Paper Structure

This paper contains 90 sections, 5 equations, 14 figures, 8 tables, 6 algorithms.

Figures (14)

  • Figure 1: AfterLearnER vs Classical ML. Top: Standard gradient-based training (e.g., backpropagation) and hyperparameter tuning (the outer loop). Bottom: The two modes of AfterLearnER. Left: In the offline mode, retrofitting of some parameters (termed $\aleph$-parameter, see text) of the trained model, once and for all before test time, as in \ref{['sec:un', 'sec:deux', 'sec:sept', 'sec:huit']}, and the output of AfterLearnER is an optimized model. Right: In the online mode, the $\aleph$-parameter can also include some model input in the latent space, the objective can then be dynamic (the loss, the user feedback, or a surrogate model), as in \ref{['sec:quatre', 'sec:cinq', 'sec:six']}. The output of AfterLearnER is then an improved output for the given input.
  • Figure 2: Depth Sensing: Average (over 60 runs) $\aleph$-loss es on the test set $\mathcal{V}'$ of the small and hybrid models (the baselines) , and of the model optimized by AfterLearnER on Thresholds 1 (left), 2 (middle) or 3 (right), for different values of $k$ (x-axis) , with $b=50$.
  • Figure 3: Results of AfterLearnER on several problems from unsuptran with BLEU or Accuracy $\aleph$-losss (the lower the better). Each curve is the average over 3 runs of AfterLearner with $k=1$ (i.e., one internal run) and the set $O$ limited to a single optimization method. The horizontal line is the baseline, i.e., without any retrofitting. Note that if we remove Zero (which is a baseline run for validation, which does not modify the baseline) and SQP (which does not make sense for this dimensionality), AfterLearner is successful in 31 independent cases out of 32, with p-value $\leq 1e-8$ (\ref{['sec:pvalue']}) .
  • Figure 4: Example of evolution: unmodified cat (bottom) generated by EG3D-cats and its evolutionary counterpart (top). Left: the ears have been significantly improved, while the neck is improved on the right.
  • Figure 5: Example of evolution: unmodified cat (bottom) generated by EG3D-cats and its evolutionary counterpart (top). In both cases, the mouth has been significantly modified, though it is still not perfect.
  • ...and 9 more figures