Circuit design in biology and machine learning. I. Random networks and dimensional reduction

Steven A. Frank

Circuit design in biology and machine learning. I. Random networks and dimensional reduction

Steven A. Frank

TL;DR

This paper investigates how biological circuits relate to machine learning circuits by examining two core design themes: randomly connected reservoir networks and dimensional reduction with internal models. It shows that random reservoirs can store temporal information and enable effective prediction, offering a plausible route for the emergence of perception–response links and guiding subsequent refinement by natural selection. It then develops dimensional reduction through encoder-like architectures and internal models, including a concrete biochemical circuit for trend prediction, illustrating how simple networks can infer environmental dynamics and predict their direction. The work argues that biology and machine learning share structural solutions, such as memory through recurrence and compact representations via hourglass-like pathways, and it proposes a framework for leveraging machine learning to generate testable hypotheses about biological circuit design and evolution.

Abstract

A biological circuit is a neural or biochemical cascade, taking inputs and producing outputs. How have biological circuits learned to solve environmental challenges over the history of life? The answer certainly follows Dobzhansky's famous quote that ``nothing in biology makes sense except in the light of evolution.'' But that quote leaves out the mechanistic basis by which natural selection's trial-and-error learning happens, which is exactly what we have to understand. How does the learning process that designs biological circuits actually work? How much insight can we gain about the form and function of biological circuits by studying the processes that have made those circuits? Because life's circuits must often solve the same problems as those faced by machine learning, such as environmental tracking, homeostatic control, dimensional reduction, or classification, we can begin by considering how machine learning designs computational circuits to solve problems. We can then ask: How much insight do those computational circuits provide about the design of biological circuits? How much does biology differ from computers in the particular circuit designs that it uses to solve problems? This article steps through two classic machine learning models to set the foundation for analyzing broad questions about the design of biological circuits. One insight is the surprising power of randomly connected networks. Another is the central role of internal models of the environment embedded within biological circuits, illustrated by a model of dimensional reduction and trend prediction. Overall, many challenges in biology have machine learning analogs, suggesting hypotheses about how biology's circuits are designed.

Circuit design in biology and machine learning. I. Random networks and dimensional reduction

TL;DR

Abstract

Paper Structure (25 sections, 5 equations, 7 figures)

This paper contains 25 sections, 5 equations, 7 figures.

Introduction
Induction by comparative example
Overview
Reservoir computing
How do random networks store information?
Prediction and response
Biological insights from reservoirs
Recurrence, memory, state, and dimension
Form and function, discovery, and refinement
Origin of traits: perception and response
Prokaryotes, eukaryotes, and multicellularity
Precise traits from sloppy components
Network architecture
Neurobiology
Dimensional reduction
...and 10 more sections

Figures (7)

Figure 1: Reservoir networks provide information about past inputs. The simple examples here illustrate the process. The intensity of shading for each node reflects its value, with lower values having darker shading and darker color. (a) The five nodes in this example create a circular network rodan10minimum. In each timestep, each node passes its value, $v$, transformed by $\tanh(v/2)$, to the next node, in which $\tanh$ is the hyperbolic tangent function. That function maps its input to the range $[-1,1]$, keeping the values in the network bounded to a useful set. The node at the top provides the external input into the network, with input value transformed by $\tanh$. In this example, the inputs $(0.1,0.3,0.5,0.7,0.9)$ rise linearly over time. As the inputs proceed from left to right, the network values encode information about the input sequence. The final state on the right has declining values around the circle when starting from the top node, associated with the rising inputs over time. (b) Network dynamics for flat inputs, each of $0.7$. In the final state, nodal values decline more slowly. (c) Network dynamics for falling inputs, which are the reverse order of the inputs in (a). In the final state, nodal values are relatively flat, balancing the decline of input values against the decline in values transmitted around the circle. (d) A reservoir network with ten bidirectionally connected nodes. Each circle of nodes shows a different instance of the same network. The weights connecting the nodes are random, and the weights connecting each input to each node are also random. The same network connectivity is used for each instance. The input pattern for an instance is described at its center. In this case, each input sequence has 11 values. The illustrations show the final state after all inputs. The conclusion is that a network's final state reflects its input pattern, providing the system with information about the history of inputs.
Figure 2: Match between actual (blue) environmental inputs and predicted (gold) inputs generated by a reservoir network. See text for description of each plot. Briefly, (a) the match over 100 nondimensional time units, (b) magnification of a short time sequence, and (c) gold curve shifted 2 units to the left. I created the reservoir networks with the Julia programming language package ReservoirComputing.jl martinuzzi22reservoircomputing.jl:. Each network has 20 nodes. The connectivity matrix was randomly generated with a $0.6$ expected frequency of zeros. The connectivity weights were normalized to a matrix spectral radius of $1.0$, which means that the nodal values tend neither to increase nor decrease over time. The input is $u(t) = \sin(t)+\sin(0.51t)+\sin(0.22t)$. The total input sequence occurred over time units $t=[1,300]$ in increments of $\Delta t=0.05$. In the training period during the first 200 time units, I fit a regression model on the nodal values, $\bm{\mathrm{x}}$ in eqn \ref{['eq:rnn']}, to predict the input values at 2 time units into the future, which is $2/\Delta t=40$ sequence increments. To fit the prediction model, I used lasso regression with an L1 regularization cost of $0.0003$. That linear cost on the magnitude of the parameters prevents large parameter values and reduces to zero those parameters that contribute little to the predicted fit. In this case, the fitted regression used 7 of the 20 nodal values, a significant reduction in dimensionality of the input complexity. Reducing the L1 cost to zero used all 20 nodal values in the fitted regression and gave a nearly perfect fit. The plots show the results for the inputs that were not used during the fitting procedure. The freely available Julia computer code provides full details about assumptions and methods for all figures in this article frank24circuit-code.
Figure 3: Encoder circuit prediction for the direction of change in a sequence of observations. The input sequence is calculated by starting with a random walk, $\textrm{d} \tilde{u} = \sigma\textrm{d} W$, at $\tilde{u}_0=0$ with $\sigma=0.2$, in which $\textrm{d} W$ is a Wiener process that samples a normal random variable with standard deviation of $\sqrt{\textrm{d} t}$, with $\textrm{d} t=0.01$ in all examples. The sequence values are normalized to $[0,1]$ by affine transformation, yielding $\hat{u}$. The sequence is then replaced by its exponential moving average, $u_t = \beta \hat{u}_t + \large(1-\beta\large) u_{t-\Delta t}$, sampled at discrete time points, and with $u_0=\hat{u}_0$ and $\beta=0.2$. Figure \ref{['fig:encoder_net']} shows the circuit architecture. The 12 network parameters in the Fig. \ref{['fig:encoder_net']} circuit were adjusted to reduce the loss function by the Adam learning algorithm with learning rate $0.005$ applied to 25,000 randomly generated sequences. The examples in this section analyze 300 time units with 10 sample points per time unit for a total of 3000 sample points. After optimization, new sequences were used to test performance, as follows. (a) Input sequence. (b) The fitness for each prediction is the absolute value of the difference between the current and prior observation multiplied by $-1$ if the prediction about direction is incorrect. Cumulative fitness is proportional to the sum over all predictions up to the current time point. (c,d) A magnified view of a time interval from the plots above. (e--h) Similar plots for a second input sample.
Figure 4: Encoder dimensional reduction network for predicting sequence trend. In this example, the input is a sequence of values. The recurrent memory layer is described by eqn \ref{['eq:rnn']}, allowing the network to calculate information about current and past inputs and retain that information in the internal states, $x_i$. Output from this layer is then passed as input to the dimension reduction layer. That layer sums an affine transformation of each input to produce a one-dimensional output passed to the final layer. The final layer applies to its input an affine transformation followed by a sigmoid function, transforming inputs on $(-\infty,\infty)$ to the final output on $[0,1]$. The output, $\rho$, is taken as a prediction of the direction of change of the next input value relative to the current input value. An actual positive change in the data associates to a target of $\kappa=1$, and a negative change associates to a target of $\kappa=0$. The distance between the prediction, $\rho$, and its target, $\kappa$, is the cross-entropy loss function, $-\kappa\log\rho-(1-\kappa)\log(1-\rho)$. Figure \ref{['fig:enc_dyn']} shows an application of this circuit architecture to a particular example.
Figure 5: Circuit accuracy in predicting the direction of input change. Random input sequences were generated as described in Fig. \ref{['fig:enc_dyn']}. For a particular generated sequence with $100,000$ sample points, the direction of change between two inputs predicted the direction of change for the next input with frequency $0.795$, which estimates the maximum accuracy that could be achieved on that sequence. For each of $10,000$ novel sequences with $3000$ sample points, I calculated the deviation between the optimized circuit's frequency of correct predictions and the maximum estimated accuracy. The histogram shows the distribution of those deviations. The median, mean, and standard deviation refer to that distribution of deviations.
...and 2 more figures

Circuit design in biology and machine learning. I. Random networks and dimensional reduction

TL;DR

Abstract

Circuit design in biology and machine learning. I. Random networks and dimensional reduction

Authors

TL;DR

Abstract

Table of Contents

Figures (7)