Table of Contents
Fetching ...

Training Across Reservoirs: Using Numerical Differentiation To Couple Trainable Networks With Black-Box Reservoirs

Andrew Clark, Jack Moursounidis, Osmaan Rasouli, William Gan, Cooper Doyle, Anna Leontjeva

TL;DR

This work tackles gradient evaluation when objective functions are not analytically differentiable by introducing Bounded Numerical Differentiation (BOND), a perturbation-based zeroth-order method that estimates partial derivatives at the inputs to a black-box reservoir. By caching a differentiable read-in network and backpropagating through known graphs, BOND achieves gradient-sign information with reduced perturbation complexity, approaching first-order convergence without expanding the trainable parameter count. Empirical results on CHPD and CIFAR-100 show that embedding fixed black-box modules via networks like Network-Frozen-Network (NFN) and Network-Echo-Network (NEN) can improve performance, with Parallel reservoir configurations offering the strongest gains; however, BOND incurs computational overhead and requires further theoretical grounding. Overall, the paper demonstrates the feasibility of integrating analogue or black-box components into trainable architectures to scale model capacity and explore hybrid digital-physical computation, while outlining clear directions for speedups and rigorous convergence analysis.

Abstract

We introduce Bounded Numerical Differentiation (BOND), a perturbative method for estimating partial derivatives across network structures with inaccessible computational graphs. BOND demonstrates improved accuracy and scalability from existing perturbative methods, enabling new explorations of trainable architectures that integrate black-box functions. We observe that these black-box functions, realized in our experiments as fixed, untrained networks, can enhance model performance without increasing the number of trainable parameters. This improvement is achieved without extensive optimization of the architecture or properties of the black-box function itself. Our findings highlight the potential of leveraging fixed, non-trainable modules to expand model capacity, suggesting a path toward combining analogue and digital devices as a mechanism for scaling networks.

Training Across Reservoirs: Using Numerical Differentiation To Couple Trainable Networks With Black-Box Reservoirs

TL;DR

This work tackles gradient evaluation when objective functions are not analytically differentiable by introducing Bounded Numerical Differentiation (BOND), a perturbation-based zeroth-order method that estimates partial derivatives at the inputs to a black-box reservoir. By caching a differentiable read-in network and backpropagating through known graphs, BOND achieves gradient-sign information with reduced perturbation complexity, approaching first-order convergence without expanding the trainable parameter count. Empirical results on CHPD and CIFAR-100 show that embedding fixed black-box modules via networks like Network-Frozen-Network (NFN) and Network-Echo-Network (NEN) can improve performance, with Parallel reservoir configurations offering the strongest gains; however, BOND incurs computational overhead and requires further theoretical grounding. Overall, the paper demonstrates the feasibility of integrating analogue or black-box components into trainable architectures to scale model capacity and explore hybrid digital-physical computation, while outlining clear directions for speedups and rigorous convergence analysis.

Abstract

We introduce Bounded Numerical Differentiation (BOND), a perturbative method for estimating partial derivatives across network structures with inaccessible computational graphs. BOND demonstrates improved accuracy and scalability from existing perturbative methods, enabling new explorations of trainable architectures that integrate black-box functions. We observe that these black-box functions, realized in our experiments as fixed, untrained networks, can enhance model performance without increasing the number of trainable parameters. This improvement is achieved without extensive optimization of the architecture or properties of the black-box function itself. Our findings highlight the potential of leveraging fixed, non-trainable modules to expand model capacity, suggesting a path toward combining analogue and digital devices as a mechanism for scaling networks.

Paper Structure

This paper contains 34 sections, 24 equations, 10 figures, 2 tables, 4 algorithms.

Figures (10)

  • Figure 1: Diagram of network structure used for LEN and LFN
  • Figure 2: Diagram of network structure used for NEN and NFN
  • Figure 3: The performance of BOND compared to existing numerical methods (SPSA) and auto-differentiation (AD) on the California House Price Dataset.
  • Figure 4: The performance of architectures with (NFN) and without (no reservoir) a black-box function, with varying gradient estimation methods, on the California House Price Dataset.
  • Figure 5: The performance of architectures with (NFN) and without (no reservoir) black-box components on CIFAR-100.
  • ...and 5 more figures