Category Archives: neuroscience

Workshop talk – Methods for scaling neural computation

A couple of months ago I gave a talk at the Neuro-Inspired Computational Elements (NICE) workshop, about the use of cortical microcircuits for adaptation in the prediction and control problems. The talk was recorded, and here is a link: Ignore the fire alarm that gets pulled in the first minute, the actual talk starts around 1 minute.

The were a lot of talks in general that were very interesting, which are also available online here:

I’ll be writing up some posts on the subject matter of my talk in the next couple months, explaining the methods in more detail and providing some solid examples with code. Hope you find the video interesting!

Tagged , , , ,

Likelihood calculus paper series review part 3 – Distributed control of uncertain systems using superpositions of linear operators

The third (and final, at the moment) paper in the likelihood calculus series from Dr. Terrence Sanger is Distributed control of uncertain systems using superpositions of linear operators. Carrying the torch for the series right along, here Dr. Sanger continues investigating the development of an effective, general method of controlling systems operating under uncertainty. This is the paper that delivers on all the promises of building a controller out of a system described by the stochastic differential operators we’ve been learning about in the previous papers. In addition to describing the theory, there are examples of system simulation with code provided! Which is a wonderful, and sadly uncommon, thing in academic papers, so I’m excited. We’ll go through a comparison of Bayes’ rule and Markov processes (described by our stochastic differential equations), go quickly over the stochastic differential operator description, and then dive into the control of systems. The examples and code run-through I’m going to have to save for another post, though, just to keep the size of this post reasonable.

The form our system model equation will take is the very general

dx = f(x)dt + \sum g_i(x)u_i dt + \sum h_i(x, u_i)dB_i,

where f(x) represents the environment dynamics, previously also referred to as the unforced or passive dynamics of the system, g_i(x) describes how the control signal u_i affects the system state x, h_i(x, u_i) describes the state and control dependent additive noise, and dB_i is a set of independent noise processes such as Brownian motion. Our control goal is to find a set of functions of time, u_i(t), such that the dynamics of our system behave according to a desired set of dynamics that we have specified.

In prevalent methods in control theory, uncertainty is a difficult problem, and often can only be effectively handled with a number of simplifications, such as linear systems of Gaussian added noise. In biological systems that we want to model, however, uncertainty is ubiquitous. There is noise on outgoing and incoming signals, there are unobserved controllers simultaneously exerting influence over the body, complicated and often unmodeled highly non-linear dynamics of the system and its interactions with the environment, etc. In the brain especially, the effect of unobserved controllers is a particular problem. Multiple areas of the brain will simultaneously be sending out control signals to the body, and the results of these signals tends to be only partially known or known only after a delay to the other areas. So for modeling, we need an effective means of controlling distributed systems operating under uncertainty. And that’s exactly what this paper presents: ‘a mathematical framework that allows modeling of the effect of actions deep within a system on the stochastic behaviour of the global system.’ Importantly, the stochastic differential operators that Dr. Sanger uses to do this are linear operators, which opens up a whole world of linear methods to us.

Bayes’ rule and Markov processes

Bayesian estimation is often used in sensory processing to model the effects of state uncertainty, combining prior knowledge of state with a measurement update. Because we’re dealing with modeling various types of system uncertainty, it seems like a good idea to consider Bayes’ rule. Here, Dr. Sanger shows that Bayes’ rule is in fact insufficient in this application, and Markov processes must be used. There are a number of neat insights that come from this comparison, so it’s worth going through.

Let’s start by writing Bayes’ rule and Markov processes using matrix equations. Bayes’ rule is the familiar equation

p(x|y) = \frac{p(y|x)}{p(y)}p(x),

where p(x) represents our probability density or distribution, so p(x) \geq 0 and \sum_i p(x = i) = 1. This equation maps a prior density p(x) to the posterior density p(x|y). Essentially, this tells us the effect a given measurement of y has on the probability of being in state x. To write this in matrix notation, we assume that x takes on a finite number of states, so p(x) is a vector, which gives

p_x' = Ap_x,

where p_x and p_x' are the prior and posterior distributions, respectively, and A in a diagonal matrix with elements A_{ii} = \frac{p(y|x)}{p(y)}.

Now, the matrix equation for a discrete-time, finite-state Markov process is written

p_x(t+1) = Mp_x(t).

So where in Bayes’ rule the matrix (our linear operator) A maps the prior distribution into the posterior distribution, in Markov processes the linear operator M maps the probability density of the state at time t into the density at time t+1. The differences come in through the form of the linear operators. The major difference being that A is a diagonal matrix, while there is no such restriction for M. The implication of this is that in Bayes’ rule the update of a given state x depends only on the prior likelihood of being in that state, whereas in Markov processes the likelihood of a given state at time t+1 can depend on the probability of being in other states at time t. The off diagonal elements of M allow us to represent the probability of state transition, which is critical for capturing the behavior of dynamical systems. This is the reason why Bayes’ method is insufficient for our current application.

Stochastic differential operators

This derivation of stochastic differential operators is by now relatively familiar grounds, so I’ll be quick here. Starting with a stochastic differential equation

dx = f(x)dt + g(x)dB,

where dB is our noise source, the differential of unit variance Brownian motion. The noise term introduces randomness into the state variable x, so we describe x with a probability density p(x) that evolves through time. This change of the probability density through time is captured by the Fokker-Planck partial differential equation

\frac{\partial}{\partial t}p(x,t) = - \frac{\partial}{\partial x}(f(x)p(x,t)) + \frac{1}{2} \frac{\partial^2}{\partial x^2}(g(x)g^T(x)p(x,t)),

which can be rewritten as

\dot{p} = \mathcal{L}p,

where \mathcal{L} is the linear operator

\mathcal{L} = - \frac{\partial}{\partial x}f(x) + \frac{1}{2} \frac{\partial^2}{\partial x^2}g(x)g^T(x).

\mathcal{L} is referred to as our stochastic differential operator. Because the Fokker-Planck equation is proven to preserve probability densities (non-negativity and sum to 1), applying \mathcal{L} to update our density will maintain its validity.

What is so exciting about these operators is that even though they describe nonlinear systems, they themselves are linear operators. What this means is that if we have two independent components of a system that affect it’s dynamics, described by \mathcal{L}_1 and \mathcal{L}_2, we can determine their combined effects on the overall system dynamics through a simple summation, i.e. \dot{p} = (\mathcal{L}_1 + \mathcal{L}_2)p.

Controlling with stochastic differential operators

Last time, we saw that control can be introduced by attaching a weighting term to the superposition of controller dynamics, giving

\dot{p} = \sum_i u_i \mathcal{L}_i p,

where \mathcal{L}_i is the stochastic differential operator of controller i, and u_i is the input control signal to that controller. In the context of a neural system, this equation describes a set of subsystems whose weighted dynamics give rise to the overall behavior of the system. By introducing our control signals u_i, we’ve made the dynamics of the overall system flexible. As mentioned in the previous review post, our control goal is to drive the system to behave according to a desired set of dynamics. Formally, we want to specify u_i such that the actual system dynamics, \hat{\mathcal{L}}, match some desired set of dynamics, \mathcal{L}^*. In equation form, we want u_i such that

\mathcal{L}^* \approx \hat{\mathcal{L}} = \sum_i u_i \mathcal{L}_i.

It’s worth noting also here that the resulting \hat{\mathcal{L}} still preserves the validity of densities that it is applied to.

How well can the system approximate a set of desired dynamics?

In this next section of the paper, Dr. Sanger talks about reworking the stochastic operators of a system into an orthogonal set, which can then be used to easily approximate a desired set of dynamics. It’s my guess that the motivation behind doing this is to see how close the given system is able to come to reproducing the desired dynamics. This is my guess because this exercise doesn’t really generate control information that can be used to directly control the system, unless we translate the weights calculated by doing this back into term of the actual set of actions that we have available. But it can help you to understand what your system is capable of.

To do this, we’ll use Gram-Schmidt orthogonalization, which I describe in a recent post. To actually follow this orthogonalization process we’ll need to define an inner product and normalization operator appropriate for our task. A suitable inner product will be one that lets us compare the similarity between two of our operators, L_1 and L_2, in terms of their effects on an initial state probability density, p_0. So define

\langle L_1, L_2 \rangle_{p_0} = \langle L_1 p_0, L_2 p_0 \rangle = \langle \dot{p}_{L_1} , \dot{p}_{L_2} \rangle

for the discrete-space case, and similarly

\langle \mathcal{L}_1, \mathcal{L}_2 \rangle_{p_0} = \int (\mathcal{L}_1 p_0)(\mathcal{L}_2 p_0)dx = \int \dot{p}_{L_1} \dot{p}_{L_2} dx

in the continuous-state space. So this inner product calculates the change in probability density resulting from applying these two operators to this initial condition, and finds the amount which they move the system in the same direction as the measure of similarity.
The induced norm that we’ll use is the 2-norm,

||L||_{p_0} = \frac{||L p_0||_2}{||p_0||_2}.

With the above inner product and normalization operators, we can now take our initial state, p_0, and create a new orthogonal set of stochastic differential operators that span the same space as original set through the Gram-Schmidt orthogonalization method. Let’s denote the orthonormal basis set vectors as \Lambda_i. Now, to approximate a desired operator L^*, generate a set of weights, \alpha, over our orthonormal basis set using a standard inner product: \alpha_i = \langle L^*, \Lambda_i \rangle. Once we have the \alpha_i, the desired operator can be recreated (as best as possible given this basis set),

L^* \approx \hat{L} = \sum \alpha_i \Lambda_i.

This could then be used as a comparison measure as the best approximation to a desired set of dynamics that a given system can achieve with its set of operators.

Calculating a control signal using feedback

Up to now, there’s been a lot of dancing around the control signal, including it in equations and discussing the types of control a neural system could plausibly implement. Here, finally, we actually get into how to go about generating this control signal. Let’s start with a basic case where we have the system

\dot{p} = (\mathcal{L}_1 + u\mathcal{L}_2)p,

where \mathcal{L}_1 describes the unforced/passive dynamics of the system, \mathcal{L}_2 describes the control-dependent dynamics, and u is our control signal.

Define a cost function V(x) that we wish to minimize. The expected value of this cost function at a given point in time is

E[V] = \int V(x)p(x)dx,

which can be read as the cost of each state weighted by the current likelihood of being in that state.
To reduce the cost over time, we want the derivative of our expected value with respect to time to decrease. Written in an equation, we want

\frac{d}{dt}E[V] = \int V(x)\dot{p}(x)dx < 0.

Note that we can sub in for \dot{p} to give

\frac{d}{dt}E[V] = \int V(x)[\mathcal{L}_1p(x) + u\mathcal{L}_2p(x)]dx.

Since our control is effected through u, at a given point in time where we have a fixed and known p(x,t), we can calculate the effect of our control signal on the change in expected value over time, \frac{d}{dt}E[V], by taking the partial differential with respect to u. This gives

\frac{\partial}{\partial u}\left[\frac{d}{dt}E[V]\right] = \int V(x)\mathcal{L}_2p(x)dx,

which is intuitively read: The effect that the control signal has on the instantaneous change in expected value over time is equal to the change in probability of each state x weighted by the cost of that state. To reduce \frac{d}{dt}E[V], all we need to know now is the sign of the right-hand side of this equation, which tells us if we should increase or decrease u. Neat!

Although we only need to know the sign, it’s nice to include slope information that gives some idea of how far away the minimum might be. At this point, we can simply calculate our control signal in a gradient descent fashion, by setting u = - k \int V(x)\mathcal{L}_2p(x)dx. The standard gradient descent interpretation of this is that we’re calculating the effect that u has on our function \frac{d}{dt}E[V], and assuming that for some small range, k, our function is approximately linear. So we can follow the negative of the function’s slope at that point to find a new point on that function that evaluates to a smaller value.

This similarly extends to multiple controllers, where if there is a system of a large number of controllers, described by

\dot{p} = \sum_i u_i \mathcal{L}_i p,

then we can set

u_i = - k_i \int V(x)\mathcal{L}_ip(x)dx.

Dr. Sanger notes that, as mentioned before, neural systems will often not permit negative control signals, so where u_i < 0 we set u_i = 0. The value of u_i for a given controller is proportional to the ability of that controller to be reduce the expected cost at that point in time. If all of the u_i = 0, then it is not possible for the available controllers to reduce the expected cost of the system.

Comparing classical control and stochastic operator control

Now that we finally have a means of generating a control signal with our stochastic differential operators, let’s compare the structure of our stochastic operator control with classical control. Here’s a picture:


The most obvious differences are that a cost function V(x) has replaced the desired trajectory \dot{x} and the state x has been replaced by a probability density over the possible states, p(x). Additionally, the feedback in classical control is used to find the difference between the desired and actual state, which is then multiplied by a gain, to generate a corrective signal, i.e. u = k * (x^* - x), whereas in stochastic operator control signal is calculated as specified above, by following the gradient of the expected value of the cost function, i.e. u = -k \int V(x)p(x)dx.

Right away there is a crazy difference between our two control systems already. In classical control case, we’re following a desired trajectory, several things are implied. First, we’ve somehow come up with a desired trajectory. Second, we’re necessarily assuming that regardless of what is actually going on in the system, this is the trajectory that we want to follow. This means that the system is not robust to changes in the dynamics, such as outside forces or perturbations applied during movement. In the stochastic operator control case, we’re not following a desired path, instead the system is looking to minimize the cost function at every point in time. This means that it doesn’t matter where we start from or where we’re going, the stochastic operator controller looks at the effect that each controller will have on the expected value of the cost function and generates a control signal accordingly. If the system is thrown off course to the target, it will recover by itself, making it far more robust than classical control theory. Additionally, we can easily change the cost function input to the system and see a change in the behaviour of the system, whereas in classical control a change in the cost function requires that we regenerate our desired trajectory \dot{x} before our controller will act appropriately.

While these are impressive points, it should also be pointed out that stochastic operator controllers are not the first to attack these issues. The robustness and behaviour business is similarly handled very well, for specific system (either linear or affine, meaning linear in terms of the dynamics of the control signal applied) and cost function forms (usually quadratic), by optimal feedback controllers. Optimal feedback controllers regenerate the desired trajectory online, based on system feedback. This leads to a far more robust control system that classical control provides. However, as mentioned, this is only for specific system and cost function forms. In the stochastic operator control any type of cost function be applied, and the controller dynamics described by L_i can be linear or nonlinear. This is a very important difference, making stochastic operator control far more powerful.

Additionally, stochastic operator controllers operate under uncertainty, by employing a probability density to generate a control signal. All in all, stochastic operator controllers provide an impressive and novel amount of flexibility in control.


Here, Dr. Sanger has taken stochastic differential operators, looked at their relation to Bayes’ rule, and developed a method for controlling uncertain systems robustly. This is done through the observation that these operators have linear properties that lets the effects of distributed controllers on a stochastic system be described through a simple superposition of terms. By introducing a cost function, the effect of each controller on the expected cost of the system at each point in time can be calculated, which can then be used to generate a control signal that robustly minimizes the cost of the system through time.

Stochastic differential operators can often be inferred from the problem description; their generation is something that I’ll examine more closely in the next post going through examples. Using these operators time-varying expected costs associated with state-dependent cost functions can be calculated. Stochastic operator controllers introduce a significant amount more freedom in choice of cost function than has previously been available in control. Dr. Sanger notes that an important area for future research will be in the development of methods for optimal control of systems described by stochastic differential operators.

The downside of the stochastic operator controllers is that they are very computationally intensive, due to the fact that they must propagate a joint-density function forward in time at each timestep, rather than a single system state. One of the areas Dr. Sanger notes of particular importance for future work is the development of parallel algorithms, both for increasing simulation speed and examining possible neural implementations of such an algorithms.

And finally, stochastic differential operators exact a paradigm shift from classical control on the way control is considered. Instead of driving the system to a certain target state, the goal is to have the system behave according to a desired set of dynamics. The effect of a controller is then the difference between the behavior of the system with and without the activity of the controller.

Comments and thoughts

This paper was particularly exciting because it discussed the calculation of the control signals for systems which we’ve described through the stochastic differential operators that have been developed through the last several papers. I admit confusion regarding the aside about developing an orthogonal equivalent set of operators, it seemed a bit of a red herring in the middle of the paper. I left out the example and code discussion from this post because it’s already very long, but I’m looking forward to working through them. Also worth pointing out is that I’ve been playing fast and loose moving back and forth between continuous and discrete, just in the interest in simplifying for understanding, but Dr. Sanger explicitly handles each case.

I’m excited to explore the potential applications and implementations of this technique in neural systems, especially in models of areas of the brain that perform a ‘look-ahead’ type function. The example that comes to mind is that of the rat reaching an intersection in a T-maze, and the neural activity recorded from place cells in the hippocampus shows the rat simulating the result of going left of going right. This seems a particularly apt application of these stochastic differential operators, as a sequence of actions and the resulting state can then be simulated and evaluated, providing that you have an accurate representation of the system dynamics in your stochastic operators.

To that end, I’m also very interested by possible means of learning stochastic differential operators for an action set. Internal models are an integral parts of motor control system models, and this seems like a potentially plausible analogue. Additionally, for modeling biological systems, the complexity of dynamics is something that is often infeasible to determine analytically. All in all, I think this is a really exciting road for exploring the neural control of movement, and I’m looking forward to seeing where it leads.
Sanger, T. (2011). Distributed Control of Uncertain Systems Using Superpositions of Linear Operators Neural Computation, 23 (8), 1911-1934 DOI: 10.1162/NECO_a_00151

Likelihood calculus paper series review part 2 – Neuro-mechanical control using differential stochastic operators

The second paper put out by Dr. Terence Sanger in the likelihood calculus paper series is Neuro-mechanical control using differential stochastic operators. Building on the probabalistic representation of systems through differential stochastic operators presented in the last paper (Controlling variability, which I review here) Dr. Sanger starts exploring how one could effect control over a system whose dynamics are described in terms of these operators. Here, he specifically looks at driving a population of neurons described by differential stochastic operators to generate the desired system dynamics. Neural control of a system requires that several phenomena outside the realm of classical control theory be addressed, including the effects of variability in control due to stochastic firing, large partially unlabeled cooperative controllers, bandlimited control due to finite neural resources, and variation in the number of available neurons.

The function of a neuron in a control system can be completely described in terms of 1) the probability of it spiking due to the current system state, p(s=1|x), and 2) the effect of its response on the change in the state. Due to the inherent uncertainty in these systems, each individual neuron’s effect on the change in state is captured by a distribution, p(\dot{x}|s). And because the effect of each neuron is only a small part of a large dynamical system that includes the dynamics of the system being controlled and the effects of all the other neurons, these distributions tend to be very broad.

Rephrasing the above description, neurons are mapping a given state x to a change in state \dot{x}. Instead of using two conditional densities to describe this mapping, p(s|x) and p(\dot{x}|s), we can rewrite this more compactly as

p(\dot{x}|x) = p(\dot{x}|s=1)\;p(s=1|x) + p(\dot{x}|s=0)\;p(s=0|x),

which can be read as the probability of a change in state \dot{x} given the current state x is equal to the probability of that change in state occurring if the neuron spikes, p(\dot{x}|s=1), multiplied by the probability of that neuron spiking given the current state, p(s=1|x), plus the probability of that state occurring if the neuron doesn’t spike, p(\dot{x}|s=0), multiplied by the probability of that neuron not spiking, p(s=0|x).

Differential stochastic operators

We want to capture the mapping p(\dot{x}|x) in such a way that if we have a description of a bunch of controllers (such as neurons) and the change in system state they effect individually, we can combine them in a straightforward way to get the overall change in state resulting from all the controllers operating in parallel. To do this we can use the linear operators developed in the previous paper, which allows us combine the effects of multiple components through simple summation to determine the overall change in system state. We’ll go over it again here, as I find reading through several different versions of an idea very helpful for solidifying understanding.

Let \mathcal{L} denote a class of linear operators that act on time-varying probability densities, p(x,t), such that \frac{\partial}{\partial t}p(x,t) = \mathcal{L}p(x,t). Because these operators need to preserve the properties of valid probability density (specifically that \int p(x)dx = 1 and p(x) \geq 0), for a given operator L where \dot{p}(x) = \int L(x,y) p(y) dy we require that:

  • 1) \int L(x,y)dx = 0 for all y,
  • 2) L(x,y) \geq 0 whenever x \neq y,

which respectively enforce the aforementioned constraints.

So, first thing’s first. Let’s read out \dot{p}(x) = \int L(x,y) p(y) dy. This says that our change in the probability density, \dot{p}(x), is found by taking our function that tells us what the change in density is for system state x given our current state y, which is L(x,y), and weighting that by the probability of currently being in state y, which is p(y), then summing that all up, which is the integral.

Now the constraints. The first constraint reads out as the integral of the changes of the probability density at each point x for a given state y must be equal to 0. This means that the area of the probability density over the states after updating them is the same. So, assuming we start out with a valid density whose sum equals 1, we always have a density whose sum equals 1.

The second constraint reads out as the change in probability density for state x given a current state y must be greater than zero whenever x \neq y. This means the only time that the change in the probability density can be negative is if there is a probability of being in that state; it enforces that all p(x) \geq 0, because \dot{p} can’t be negative when p(x) is zero.

Dr. Sanger defines the linear operators that satisfy these two conditions to be “differential stochastic operators”. The discrete time versions are matrices, dubbed “difference stochastic operators”.

Superposition of differential stochastic operators

Differential stochastic operators can be derived in different ways, here we’ll go through the derivation from the ‘master’ equation defining p(\dot{x}|x), and from a stochastic differential equation. They each have their insights, so it’s helpful to work through both.

Derivation from master equation

The equation for p(\dot{x}|x) written out above,

p(\dot{x}|x) = p(\dot{x}|s=1)\;p(s=1|x) + p(\dot{x}|s=0)\;p(s=0|x),

determines the probability flow on the state. By employing the Kramers-Moyal expansion we can capture this probability flow through a partial differential equation describing change in probability density. In other words, instead of capturing the time evolution of the system state with a probability density over possible changes in state, \dot{x}, we capture it through the changing in probability of each state, \dot{p}(x). The Kramers-Moyal expansion looks like:

\frac{\partial}{\partial t}p(x,t) = -\frac{\partial}{\partial x}(D_1(x)p(x,t)) + \frac{1}{2}\frac{\partial^2}{\partial x^2}(D_1(x)p(x,t)) + ...,

where D_k(x) = E[(\dot{x})^k] = \int \dot{x}^k p(\dot{x}|x) d\dot{x}. Truncating this expansion at the first two terms we get the Fokker-Planck equation, where the first term describes the drift of the density, and the second term the diffusion. These two terms are sufficient for describing Gaussian conditional densities, which capture many physical phenomena. In the case where p(\dot{x}|x) does not have a Gaussian distribution, higher-order terms from the Kramers-Moyal expansion will need to be included.

Now, imagine we have a simple system of two neurons, where the change in state is defined as \dot{x} = \dot{x}_1 + \dot{x}_2. If these neurons have conditionally independent variability, i.e. p(\dot{x}_1 \dot{x}_2 | x) = p(\dot{x}_1|x)p(\dot{x}_2|x), then we can sum the Kramers-Moyal expansion of each of these terms to describe the evolution of the overall system state:

\frac{\partial}{\partial t}p(x,t) = - \sum_i \frac{\partial}{\partial x}(D_{1i}(x)p(x,t)) + \frac{1}{2} \sum_i \frac{\partial^2}{\partial x^2}(D_{2i}(x)p(x,t)) + ...,

as long as the neurons have conditionally independent variability. This means that they can’t be connected (directly or indirectly) such that a spike in one neuron causes a spike in the other. While this might seem a poor assumption for modeling networks of spiking neurons, in large populations with many input, the effects of any single input neuron tends to be small enough that the assumption holds approximately.

We can rewrite the previous equation now, taking advantage of linearity of p(x,t) and the Kramers-Moyal coefficients, to get

\dot{p}(x,t) = \mathcal{L}p = \sum_i \mathcal{L}_i p,

which means that by describing neurons with the differential stochastic operators, \mathcal{L}_i, we can determine the cumulative effect on the dynamics of the system through simple summation. Which we all remember from the last paper, but hey, good to review.

Now, in the event that we want to model the effect of a group of strongly interconnected neurons, we can instead consider the effect of the group as a function of the 2^n possible firing patterns (spike or no spike from each of the neurons). So where before p(\dot{x}| x) was written in terms of the two cases s = 0 and s = 1, it would now be written:

p(\dot{x}|x) = \sum_{i=1}^{2^n} p(\dot{x}|x,i)p(i),

where each i is a different spike pattern. This group on neurons and their effect on the system dynamics is then considered as a single unit, and the differential stochastic operator describing them can then be combined with the operator from another group of neurons, provided that the two groups exhibit conditionally independent variability.

If there are no independent groups in the network then it’s fully connected and this is not for you go home.

Derivation from a stochastic differential equation

For this derivation, imagine a set of controllers operating in parallel. Each controller has a stochastic differential equation that defines how a control signal input affects the system dynamics,

dx = f_i(x)dt + g_i(x)dB_i,

where f_i and g_i are arbitrary equations and dB_i are random functions of time, or noise terms. Let f_i(x)dt for i > 0 be the controller equations, and f_0(x)dt be the unforced, or passive, dynamics of the system, which define how the system behaves without controller input. We can write the equation for the whole system as

dx = f_0(x)dt + \sum_{i>0}u_i(f_i(x)dt + g_i(x)dB_i),

where u_i are constant or slowly varying control inputs. The reason we would choose to denote the unforced (passive) dynamics of the system as f_0 is because we can now define u_0 = 1, and rewrite the above equation as

dx = \sum_{i}u_i(f_i(x)dt + g_i(x)dB_i).

The corresponding Fokker-Planck equation for the evolution of the state probability density is

\frac{\partial}{\partial t}p(x,t) = - \sum_i u_i \frac{\partial}{\partial x}(f_i(x)p(x,t)) + \frac{1}{2} \sum_i u_i \frac{\partial^2}{\partial x^2}(g_i(x)p(x,t)).

Look familiar? We can rewrite this as a superposition of linear operators

\dot{p}(x,t) = \mathcal{L}p = \sum_i u_i \mathcal{L}_i p,

and there you go.

Population model

So, now we can apply this superposition of differential stochastic equations to describe the effect of a population of neurons on a given system. Dr. Sanger lists several ways that this model can go about being controlled; 1) modifying the tuning curves of the neurons, which specifies how they respond to stimulus; 2) modify the output functions that determines the effect that a neuron has on the dynamics of the system; and 3) modifying the firing threshold of the neurons.

I found the difference between 1 and 3 can be a little confusing, so let’s look at an example tuning curve in 2D space to try to make this clear. Imagine a neuron sensitive to -dimensional input signals, and that it’s tuning curve looks like this:


If we’re changing the tuning curve, then how this neuron responds to its input stimulus will change. For example, in this diagram we show changing the tuning curve of a neuron:

Here, the neuron no longer responds to the same type of input that it responded to previously. We have made a qualitative change to the type of stimulus this neuron responds to.

If we change the firing threshold, however, then what we’re changing is when the neuron starts responding to stimulus that it is sensitive to.

Here we show the neuron becoming more and more sensitive to a its stimulus, respond stronger sooner and sooner. So the type of signal that the neuron responds to isn’t changing, but rather when the neuron starts responding.

Alright, now that we’ve got that sorted out, let’s move on.
Tuning curves (1) and output functions (2) are both modifiable through learning, by changing the incoming and outgoing connection weights, respectively, but for controlling systems on the fly this is going to be too slow, i.e. slower than the speed at which the system moves. So what’s left is (3), modifying the firing threshold of the neurons. So the model then looks like:

where p(x) is projected in to a population of neurons, each with a stochastic differential operator that sum together to generate \dot{p}(x). In this diagram, \lambda_i is the firing threshold of neuron i, and \lambda_i(x) denotes the modulation of the firing rate of neuron i as a function of the current system state. When the modulation is dependent on the system state we have a feedback, or closed-loop, control system. Dr. Sanger notes that in the case that \lambda_i is heavily dependent on x, modulating the firing threshold is indistinguishable from modifying the tuning curve, meaning that we can get some pretty powerful control out of this.


‘The theory of differential stochastic operators links the dynamics of individual neurons to the dynamics of a full neuro-mechanical system. The control system is a set of reflex elements whose gain is modulated in order to produce a desired dynamics of the overall system.’

This paper presents a very different formulation of control than classical control theory. Here, the goal is to modulate the dynamics of the system to closely match a desired set of dynamics that achieve some task, rather than to minimize the deviation from some prespecified trajectory through state space. Dr. Sanger notes that this description of control matches well to behavioral descriptions of neural control system, where there are numerous subsystems and circuits that have reflexive responses to external stimuli which must be modulated to achieve desired behavior. The goal of control is to fully define the dynamics of the system’s reaction to the environment.

Comments and thoughts

What comes to mind first for me, in terms of using the modulation of reflex elements to effect a desired set of dynamics, is modeling the spinocerebellum. With a ton of projections directly to the spinal cord, and strong implications in locomotor and balance system, it seems like it would be a very good candidate for being modeled with this type of control. The idea being that the cerebellum is projecting modulatory values to the different spinal circuits (reflex networks and central pattern generators, for example) that specify how to respond to changes in the environment to maintain our balance or the rhythm of our walk. How we go about specifying exactly what those modulatory terms need to be is something that Dr. Sanger tackles in the last paper of this series, which I’ll be reviewing in the next couple of months. I’m looking forward to it.

On another note, in my lab we all work with the Neural Engineering Framework, in which populations of neurons are taken to represent vectors, and to perform transformations on these vectors to relay information an perform various functions. To this end, something that interests me about likelihood calculus is its application to populations of neurons representing vectors. Instead of finding p(\dot{x}|x) by summing the effects of all of the neurons in a population, or defining it in terms of the population spiking patterns, we’re looking at it in terms of the different vectors this population can represent, and the effect of that vector on the system. So we can still have spiking neuron based models, but we can do all the likelihood calculus business one level removed, simplifying the calculation and reducing the number of differential stochastic operators needed.

There are a ton of things going on in this paper, and lots to think about. At several points I deviated from the notation used in this paper because I found it unclear, but aside from that I’ve really enjoyed reading through it and writing it up. Very interesting work.
Sanger TD (2010). Neuro-mechanical control using differential stochastic operators. Conference proceedings : … Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference, 2010, 4494-7 PMID: 21095779

Tagged , , , , , , ,

Likelihood calculus paper series review part 1 – Controlling variability

Dr. Terry Sanger has a series of papers that have come out in the last few years describing what he has named ‘likelihood calculus’. The goal of these papers is to develop a ‘a theory of optimal control for variable, uncertain, and noisy systems that nevertheless accomplish real-world tasks reliably.’ The idea being that successful performance can be thought of as modulating variance of movement, allocating resources to tightly control motions when required and allowing variability in task-irrelevant dimensions. To perform variability modulation, we first need a means of capturing mathematically how the features of an uncertain controller operating affect variability in system movement. Defining terms quickly, the features of a controller are the different components that produce signals resulting in movement, variability is taken here to be the trial-to-trial variation in movements, and uncertainty means that the available sensory feedback does not uniquely determine the true state of the world, where uncertainty can arise from noise on sensory feedback signals, unmodeled dynamics, and/or quantitization of sensory feedback. To capture all this uncertainty and variability, probability theory will naturally be employed. In this post I will review the paper ‘Controlling variability’ (2010) by Dr. Sanger, which sets up the framework for describing the time course of uncertainty during movement.

Using probability in system representations

So, here’s a picture of the system our controller (the brain) is in:

There’s the input initial state x, and the output change in state, \dot{x}, which is generated as a combination of the unforced dynamics of the world and the control dynamics effected by the brain. But since we’re dealing with uncertainty and variability, we’re going to rewrite this such that given an initial state x, we get a probability distribution over potential changes in state, p(\dot{x}|x), which specifies the likelihood of each change in state \dot{x}, given our initial state probability distribution p(x). So in our system diagram, the word and the brain both define probability distributions over the possible changes in state, p_1(\dot{x}|x) and p_1(\dot{x}|x), respectively, which then combine to create the overall system dynamics p(\dot{x}|x). Redrawing our picture to incorporate the probabilities, we get:

One may ask: how do these probabilities combine? Good question! What we’d like to be able to do is combine them through simple linear operators, because they afford us massive simplifications in our calculations, but the combination of p_0(\dot{x}|x) and p_1(\dot{x}|x) isn’t as simple as summing and normalizing. The reason why can be a little tricky to tease out if you’re unfamiliar with probability, but will becomes clear with some thought. Consider what it means to go about combining the probabilities in this way. Basically, if you sum and normalize, then the result is saying that there is a 50% chance of doing what the brain says to do, and a 50% chance of doing what the world says to do, and doesn’t capture what is actually going to happen, which is an interaction between the effects of the brain and the world. A good example for thinking about this is rolling dice. If you roll each die individually, you have an equal chance of the result being each number 1-6, but if you roll two dice, the overall system probability of rolling numbers changes in a highly nonlinear fashion:

Suddenly there is a 0% chance of the result being a 1, and the probability of rolling a number increases in likelihood until you get to 7, at which point the likelihood decreases with ascending numbers; there is a nonlinear interaction at play that can’t be captured by summing the probabilities of rolling a number on each die individually and normalizing.

So summing the probability distributions over the possible changes in state, \dot{x}, isn’t going to work, but is there another way to combine these through linear operators? The answer is yes, but it’s going to require us to undergo a bit of a paradigm shift and expand our minds. What if, instead of capturing the change in the system by looking at p(\dot{x}|x), the probability distribution over possible changes in state given the current state, we instead capture the dynamics of the system by defining \dot{p}(x), the change in the probability of states through time? Instead of describing how the system evolves through the likelihood of different state changes, the dynamics are captured by defining the change in likelihood of different states; we are capturing the effect of the brain and the world on the temporal evolution of state probability. Does that freak you out?

Reworking the problem

Hopefully you’re not too freaked out. Now that you’ve worked your head around this concept, let’s look at \dot{p}(x) a little more closely. Specifically, let’s look at the Kramers-Moyal expansion (for a one-dimensional system):

\frac{\partial p(x,t)}{\partial t} = \sum_{k=1}^{\inf} \left( - \frac{\partial}{\partial x} \right)^k \{a_k(x) p(x)\} / k!,

a_k(x) = \int \dot{x}^k p(\dot{x}|x) d\dot{x}.

As Dr. Sanger notes, this is a daunting equation, but it can be understood relatively easily. The left side, \frac{\partial p(x,t)}{\partial t} = \dot{p}_t(x), is the rate of change of probability at each point x at time t. The right side is just the Taylor series expansion. If we take the first two terms of the Taylor series expansion, we get:

\frac{\partial p(x,t)}{\partial t} = - a_1 \frac{\partial}{\partial x} p(x) + \frac{a_2}{2} \frac{\partial^2}{\partial x^2} p(x),

where the first describes how the probability drifts (or shifts / translates), a_1 being the average value of \dot{x} for each value of x. The second term relates the rate of diffusion, a_2 being the second moment of the speed \dot{x}, describing the amount of spread in different possible speeds, where greater variability in speed leads to an increased spread of the probability. This is the Fokker-Planck equation, which describes the evolution of a physical process with constant drift and diffusion. At this point we make the assumption that our probability distributions are all going to be in the form of Gaussians (for which the Fokker-Planck equation exactly describes evolution of the system through time, and can arguably act as a good approximation to neural control systems where movement is based on the average activity of populations of neurons).

As an example of this, think of a 1-dimensional system, and a Gaussian probability distribution describing what state the system is likely to be in. The first term a_1 is the average rate of change \dot{x} across the states x the system could be in. The probability shifts through state space as specified by a_1. Intuitively, if you have a distribution with mean around position 1 and the system velocity is 4, then the change in your probability distribution, p(x), should shift the mean to position 5. The second term, a_2 is a measure of how wide the range of different possible speeds \dot{x} is. The larger the range of possible values, the less certain we become about the location of the system as it moves forward in time; the greater the range of possible states the system might end up in in the next time step. This is reflected by the rate of diffusion of the probability distribution. If we know for sure the speed the system moved at (i.e. all possible states will move with a specific \dot{x}), then we simply translate the mean of probability distribution. If however there’s uncertainty in the speed at which the system is moving, then the correct location (reflecting the actual system position) for the mean of the probability distribution could be one of a number of values. This is captured by increasing the width of (diffusing) the Gaussian.

Linear operators

Importantly, the equations above are linear in terms of p(x). This means we can rearrange the above equation:

\frac{\partial p(x,t)}{\partial t} = \left( -a_1 \frac{\partial}{\partial x} + \frac{a_2}{2} \frac{\partial^2}{\partial x^2} \right) p(x),

letting \mathcal{L} = \left( -a_1 \frac{\partial}{\partial x} + \frac{a_2}{2} \frac{\partial^2}{\partial x^2} \right), we have

\frac{\partial p(x,t)}{\partial t} = \mathcal{L} p(x).

Now we can redraw our system above as



\mathcal{L} =  \mathcal{L}_0 +  \mathcal{L}_1,

which is the straightforward combination of the different contributions of each of the brain and the world to the overall system state probability. How cool is that?

Alright, calm down. Time to look at using these operators. Let’s assume that the overall system dynamics \mathcal{L} hold constant for some period of time (taking particular care to note that ‘constant dynamics’ does not mean that a single constant output is produced irrespective of the input state, but rather that a given input state x always produces the same result while the dynamics are held constant), and we have discretized our representation of x (to be a range of some values, i.e. -100 to 100) then we can find the state probability distribution at time T by calculating

p(x, T) = A^T p(x,0),

where A^T = e^{T\mathcal{L}}.

When combining these \mathcal{L} operators, if we sum them, i.e. overall system dynamics

\mathcal{L} = \mathcal{L}_0 + \mathcal{L}_1,

and then apply them to the probability

\frac{\partial p(x,t)}{\partial t} = \mathcal{L}p(x)

this is saying that the dynamics provided by \mathcal{L}_0 and \mathcal{L}_1 are applied at the same time. But if we multiply the component dynamics operators,

\mathcal{L} = \mathcal{L}_1 \mathcal{L}_0

then when we apply them we have

\frac{\partial p(x,t)}{\partial t} = \mathcal{L}p(x) = \mathcal{L}_1 \mathcal{L}_0 p(x),

which is interpreted as applying \mathcal{L}_0 to the system, and then applying $\mathcal{L}_1$. Just basic algebra, but it allows us to apply simultaneously and sequentially the dynamics generated by our contributing system components (i.e. the brain and the world).

Capturing the effects of control

So now we have a representation of the system dynamics operating with variability and under uncertainty, we’re talking about building a tool to use for controlling these systems though, so where does the control signal u fit in? The \mathcal{L} operator is made to be a function of the control signal, describing the probablistic effect of the control signal given the possible initial states in the current state probability distribution p(x). Thus the change in state probability is now written

\frac{\partial p(x,t)}{\partial t} = \mathcal{L}(u)p(x).

Suppose that we drive a system with constant dynamics \mathcal{L}(u_1) for a period of time T_1, at which point we change the control signal and drive the system with constant dynamics \mathcal{L}(u_2) for another period of time T_2. The state of the system now be calculated

p(x, T_1 + T_2) = e^{T_2\mathcal{L}(u_2)}e^{T_1\mathcal{L}(u_1)}p(x,0) = A_{u_2}^{T_2}A_{u_1}^{T_1}p(x,0)

using the sequential application of dynamics operators discussed above.


And that is the essence of the paper ‘Controlling variability’. There is an additional discussion about the relationship to Bayes’ rule, which I will save for another post, and an example, but this is plenty for this post.

The main point from this paper is that we shouldn’t be focusing on the values of states as the object of control, but rather the probability densities of states. By doing this, we can capture the uncertainty in systems and work towards devising an effecting means of control. So, although the paper is called ‘Controlling variability’, the discussion of how to actually control variability is saved for later papers. All the same, I thought this was a very interesting paper, enjoyed working through it, and am looking forward to the rest of the series.

Sanger TD (2010). Controlling variability. Journal of motor behavior, 42 (6), 401-7 PMID: 21184358

Tagged , , , ,

Nengo model – Low pass derivative filter

To just get the code you can copy / paste from below or get the code from my github:

In the course of building models in Nengo, I recently came in to need for a neural implementation of a low pass derivative filter. I scripted up a sub-network in Nengo ( that does this, and until we get the model database / repository up and running for Nengo scripts I’ll keep working through building these things here, because it goes over some basic methods that can be useful when you’re building up your models.

Basically there are three parts to this model: Derivative calculation, absolute value calculation, and an inhibitory projection with a threshold activation that projects to the output population. Here’s a picture:
Here’s the idea: The population input projects both directly to the output population and to a population that calculates the derivative of the sum across dimensions of the input signal. The derivative population passes on the derivative of the input signal to an absolute value calculating population, which passes the absolute value of the derivative on to a population that isn’t activated for values under a threshold level. This threshold population then projects very strong inhibition to the output population, so that when the absolute value of the derivative is above the threshold level, no output is projected, and otherwise the system just relays the input signal straight through. Here’s the code:

def make_dlowpass(name, neurons, dimensions, radius=10, tau_inhib=0.005, inhib_scale=10):

    dlowpass = nef.Network(name)

    dlowpass.make('input', neurons=1, dimensions=dimensions, mode='direct') # create input relay
    output = dlowpass.make('output', neurons=dimensions*neurons, dimensions=dimensions) # create output relay
    # now we track the derivative of sum, and only let output relay the input
    # if the derivative is below a given threshold
    dlowpass.make('derivative', neurons=radius*neurons, dimensions=2, radius=radius) # create population to calculate the derivative
    dlowpass.connect('derivative', 'derivative', index_pre=0, index_post=1, pstc=0.1) # set up recurrent connection
    dlowpass.add(make_abs_val(name='abs_val', neurons=neurons, dimensions=1, intercept=(.2,1))) # create a subnetwork to calculate the absolute value  

    # connect it up!
    dlowpass.connect('input', 'output') # set up communication channel
    dlowpass.connect('input', 'derivative', index_post=0)
    def sub(x):
        return [x[0] - x[1]]
    dlowpass.connect('derivative', 'abs_val.input', func=sub)
    # set up inhibitory matrix
    inhib_matrix = [[-inhib_scale]] * neurons * dimensions
    output.addTermination('inhibition', inhib_matrix, tau_inhib, False)
    dlowpass.connect('abs_val.output', output.getTermination('inhibition'))


First off, this code is taking advantage of the absolute value function that was written a couple blog posts ago. You can either go check out that post or I’ll also have that function included in the code at the end for completeness. Aside from that, there are a lot of things going on that you won’t come across if you’re just coding up simple examples, so let’s look at them.

At the top, we’re assigning our output network a handle, which I try to avoid in general for neatness, since most of the times you can reference it by simply referring to it’s name, 'output'. The reason that I assign it a handle here is because we’re going to be calling upon some Java API features that (to my knowledge) aren’t handled in the Python API yet, and although we could call up the output node as a Java object with dlowpass.get('output') using only its assigned name, it will just be cleaner in the end to have a handle for it. We’ll come back to this.

The next interesting thing that happens, is that when we’re creating our derivative population, we set the number of neurons to 10*neurons, and radius=10. This is to because in the derivative we’re representing the sum of all of the input dimensions. How are all the input dimensions being summed up in derivative, you ask? Just below, on line 17. When we connect up the input relay to derivative we set index_post=0, which means that all of the input dimensions are going to to project to the same dimension of the derivative population. The default weight for this connection is 1, so then dimension 0 of derivative is equal to 1 * value for value in input_dimensions . Super.

But why do we set radius=10? This is because the radius parameter specifies the range of values represented by this population. The default is (-1,1), but when we specify radius, the new range of represented values becomes (-radius, radius). We’re making a bit of an assumption that this value won’t go outside of the range (-10,10) here, but that should be OK for most of the situations we’re going to come across. In the specific model I’m using this for it’s definitely the case, so that’s why I’ve set the default value to 10. And because we don’t want the accuracy in representation to decrease, we also scale up the number of neurons in this population by radius*neurons.

And there’s still more happening in this derivative population! On line 11 I specify a recurrent connection that projects into a second dimension represented in derivative. So now what’s going to happen is that the sum of the input signals is projected into the first dimension of derivative, and through a recurrent connection the value of the sum of the input signals from time t-pstc will be represented in the second dimension of the derivative population.

To calculate the derivative then, it’s a simple matter of subtracting the previous signal from the current signal, which is what happens in the function sub that I define on line 19. To implement this function, when connecting up derivative to abs_val, just set the parameter func=sub. Easy.

Now, when the abs_val population is made, we set and intercept=(.2,1). This is the same trick we used in the previous absolute value function model, but it’s acting as a threshold here. Basically, this population won’t respond if the value being projected into it is between (-.2, .2).

So, up to this point, what we have is a summation of the input dimensions, the derivative being calculated and passed to an absolute value function, and this population only responds if the derivative is greater than .2.

The last part is hooking up this abs_val population to the output relay, to suppress output whenever it’s activated (i.e. when the derivative is greater than .2). This is where we need to pull into the Java API, and this is why we specified a handle for our output population. In lines 25-26, what’s going on is that instead of using the NEF neural compiler functionality to set up our connection weights to compute some function, we’re specifying them ourselves. And we’re specifying them to prevent the neurons in output from firing. Now, the activation of the neurons in abs_val reduces the voltage values being sent into the output population, inhibiting their activity.

And that’s it! Here’s the complete code to run an example of this network (which can also be found on my github

import nef

# constants / parameter setup etc
N = 50 # number of neurons
D = 3 # number of dimensions

def make_abs_val(name, neurons, dimensions, intercept=[0]):
    def mult_neg_one(x):
        return x[0] * -1 

    abs_val = nef.Network(name)

    abs_val.make('input', neurons=1, dimensions=dimensions, mode='direct') # create input relay
    abs_val.make('output', neurons=1, dimensions=dimensions, mode='direct') # create output relay
    for d in range(dimensions): # create a positive and negative population for each dimension in the input signal
        abs_val.make('abs_pos%d'%d, neurons=neurons, dimensions=1, encoders=[[1]], intercept=intercept)
        abs_val.make('abs_neg%d'%d, neurons=neurons, dimensions=1, encoders=[[-1]], intercept=intercept)

        abs_val.connect('input', 'abs_pos%d'%d, index_pre=d)
        abs_val.connect('input', 'abs_neg%d'%d, index_pre=d)
        abs_val.connect('abs_pos%d'%d, 'output', index_post=d)
        abs_val.connect('abs_neg%d'%d, 'output', index_post=d, func=mult_neg_one)


def make_dlowpass(name, neurons, dimensions, radius=10, tau_inhib=0.005, inhib_scale=10):

    dlowpass = nef.Network(name)

    dlowpass.make('input', neurons=1, dimensions=dimensions, mode='direct') # create input relay
    output = dlowpass.make('output', neurons=dimensions*neurons, dimensions=dimensions) # create output relay
    # now we track the derivative of sum, and only let output relay the input
    # if the derivative is below a given threshold
    dlowpass.make('derivative', neurons=radius*neurons, dimensions=2, radius=radius) # create population to calculate the derivative
    dlowpass.connect('derivative', 'derivative', index_pre=0, index_post=1, pstc=0.1) # set up recurrent connection
    dlowpass.add(make_abs_val(name='abs_val', neurons=neurons, dimensions=1, intercept=(.2,1))) # create a subnetwork to calculate the absolute value  

    # connect it up!
    dlowpass.connect('input', 'output') # set up communication channel
    dlowpass.connect('input', 'derivative', index_post=0)
    def sub(x):
        return [x[0] - x[1]]
    dlowpass.connect('derivative', 'abs_val.input', func=sub)
    # set up inhibitory matrix
    inhib_matrix = [[-inhib_scale]] * neurons * dimensions
    output.addTermination('inhibition', inhib_matrix, tau_inhib, False)
    dlowpass.connect('abs_val.output', output.getTermination('inhibition'))


# Create network
net = nef.Network('net')

# Create / add low pass derivative filter
net.add(make_dlowpass(name='dlowpass', neurons=N, dimensions=D))

# Make function input
net.make_input('input_function', values=[0]*D)

# Connect up function input to filter
net.connect('input_function', 'dlowpass.input')

# Add it all to Nengo

And here’s a picture of it running. What you can see is that anytime the input changes quickly, the system input drops to zero, but when the input is holding constant or is changing slowly the output is allowed to pass through. Great! Just what we wanted. net

Tagged , ,

Nengo scripting: absolute value

To just get the code you can copy / paste from below or get the code from my github:

This is a simple script for performing the absolute value function in Nengo. The most efficient way I’ve found to implement this is to use two separate populations for each dimension of the input signal, one to represent the signal when it’s greater than zero and simply relay it to the output node, and one to represent the signal when it’s less than zero, and project x * -1 to the output node. Here’s the code, and I’ll step through it below.

def make_abs_val(name, neurons, dimensions, intercept=[0]):
    def mult_neg_one(x):
        return x[0] * -1 

    abs_val = nef.Network(name)

    abs_val.make('input', neurons=1, dimensions=dimensions, mode='direct') # create input relay
    abs_val.make('output', neurons=1, dimensions=dimensions, mode='direct') # create output relay
    for d in range(dimensions): # create a positive and negative population for each dimension in the input signal
        abs_val.make('abs_pos%d'%d, neurons=neurons, dimensions=1, encoders=[[1]], intercept=intercept)
        abs_val.make('abs_neg%d'%d, neurons=neurons, dimensions=1, encoders=[[-1]], intercept=intercept)

        abs_val.connect('input', 'abs_pos%d'%d, index_pre=d)
        abs_val.connect('input', 'abs_neg%d'%d, index_pre=d)
        abs_val.connect('abs_pos%d'%d, 'output', index_post=d)
        abs_val.connect('abs_neg%d'%d, 'output', index_post=d, func=mult_neg_one)


First off, the function takes in parameters specifying the number of dimensions, the number of neurons for each population generated, a name, and optionally an intercept value. I’ll come back to why the intercept value is an option in a bit.

Inside the make_abs_val function another function that multiplies the first dimension of its input by -1 is specified. This mult_neg_one function is going to be used by our population representing negative values of the input signal.

Next, we create the network and call it abs_val. Input and output relay nodes are then created, with one neuron, of the specified dimension number, and the populations are set to direct mode. These are the populations that will be connected to from populations outside of the abs_val network.

Now there is a loop for each dimension of the input signal. Inside, two populations are created, where the only difference is their encoder values. Their intercepts specify the start of the range of values they represent. The default is 0, so when it’s not specified these populations will represent values from 0 to 1 (1 is the default end value of the range). For abs_neg, the encoders=[[-1]] line changes the range of values represented from (0,1) to (-1,0). Now we have two populations for dimension d, one that represents only positive values (between 0 and 1), and one that represents only negative values (between -1 and 0). And we’re almost done!

The only thing left to do is to hook up the populations to the input and output appropriately and incorporate the mult_neg_one function into the connection between each of the abs_neg populations and the output relay node. We want each set of populations representing a single dimension to receive and project back into the appropriate dimension of the output relay function, so we employ the index_pre and index_post parameters. Because we want each set to receive only dimension d from the input, on that connection specification we set index_pre=d. When setting up the projections to the output relay node, we similarly only want each population to project to the appropriate output dimension d, so we set index_post=d.

By default, the connect call sets up a communication channel, that is to say no computation is performed on the signal passed from the pre to the post population. This is what we want for abs_pos population, but for the abs_neg population we want the mult_neg_one function to be applied, so that any negative values are multiplied by -1, and give us positive values. This can be done by using the func parameter, and so we call it and set it func=mult_neg_one. Now the connection from abs_neg to the output node will be transformed by the mult_neg_one function.

And that’s it! Here is a script that gets it running (which can also be found on my github:

import nef
import random

# constants / parameter setup etc
N = 50 # number of neurons
D = 3 # number of dimensions

def make_abs_val(name, neurons, dimensions, intercept=[0]):
    def mult_neg_one(x):
        return x[0] * -1 

    abs_val = nef.Network(name)

    abs_val.make('input', neurons=1, dimensions=dimensions, mode='direct') # create input relay
    abs_val.make('output', neurons=1, dimensions=dimensions, mode='direct') # create output relay
    for d in range(dimensions): # create a positive and negative population for each dimension in the input signal
        abs_val.make('abs_pos%d'%d, neurons=neurons, dimensions=1, encoders=[[1]], intercept=intercept)
        abs_val.make('abs_neg%d'%d, neurons=neurons, dimensions=1, encoders=[[-1]], intercept=intercept)

        abs_val.connect('input', 'abs_pos%d'%d, index_pre=d)
        abs_val.connect('input', 'abs_neg%d'%d, index_pre=d)
        abs_val.connect('abs_pos%d'%d, 'output', index_post=d)
        abs_val.connect('abs_neg%d'%d, 'output', index_post=d, func=mult_neg_one)


net = nef.Network('network')

# Create absolute value subnetwork and add it to net
net.add(make_abs_val(name='abs_val', dimensions=D, neurons=N))

# Create function input
net.make_input('input', values=[random.random() for d in range(D)])

# Connect things up
net.connect('input', 'abs_val.input')

# Add it all to the Nengo world

And here’s a picture of it running.

Tagged , , ,

The role of phasic dopamine in the basal ganglia

As I mentioned in my last post, I’m reading a series of papers that presents a model of the basal ganglia, written mainly by Peter Redgrave, Kevin Gurney, and John Reynolds. Of particular interest throughout these articles is a re-examination of the role of the short-term phasic dopamine (DA) signal from the substantia nigra pars compacta (SNc). A well-propagated view is that the phasic DA signal is a reward prediction-error signal, but Redgrave et al present a strong argument against this and suggest instead a role of an agency determination / novel movement identification mechanism. In this post I’m going to be presenting their argument for this, and how reinforcement and reward based learning in the basal ganglia could work at large. Again, throughout there will be comments and questions I put forth, I will make an effort for it to be clear when something is from me and when it’s from the papers.

Phasic DA signal as a reward-prediction error
The idea that the phasic DA signal serves as a reward-prediction error is born out of a series of experiments presented in [Schultz 1998]. The idea of reward-prediction error comes from instrumental (aka operant) conditioning, where rewards ‘reinforce’ behavior by strengthening associations between stimuli and behavioral responses. Formally, a reward-prediction error is defined as the difference between the reward predicted at a given point in time and the actual reward received. This goes way back to [Thorndike 1911] where Thorndike formally states the idea as the Law of Effect:

Any act which in a given situation produces satisfaction becomes associated with that situation so that when the situation recurs the act is more likely than before to recur also.

In Schultz’s experiments, the DA neurons of a monkey are recorded from as the monkey performs a number of various tasks, including “reaction time tasks, direct and delayed GO-NO GO tasks, spatial delayed response and alternation tasks, air puff and saline active avoidance tasks, operant and classically conditioned visual discrimination tasks, self-initiated movements, and unpredicted delivery of a reward in absence of any formal task.” This following image has been lifted from the results of [Schultz 1998]:

The explanation of these results is presented as follows. In the top figure, a reward (R) is unexpectedly delivered and the DA neurons activate. This is because there is a positive error in the predicted reward; no reward was expected, but there was one, BOOM, phasic DA signal. In the middle figure a conditioned stimulus (CS) has been associated with the reward, now the CS occurs unexpectedly, which means that a reward is on the way, thus once again there is a positive error in the predicted reward. Now, however, at the time of reward delivery the reward was predicted and the reward was received. There was no error in reward-prediction, so there is no phasic DA signal. And finally, in the bottom panel we see a CS cause activation of the phasic DA signal, but this time no reward is delivered. Now, exactly when the reward should be delivered and is not there is a negative reward-prediction error, and a corresponding decrease in tonic DA levels is observed.

Another interesting result from this experiment is that the phasic DA signal will push backwards along the chain of predictive events to the earliest predictive sensory stimulus signalling that a reward is coming. Taken all together, a pretty strong case for the reward-prediction error hypothesis is presented.

Problems with phasic DA as a reward-prediction error
There are, however, a number of problems have arisen under close examination of this hypothesis and through further experimental work, laid out by Redgrave et al. These are the main contentions:

  • DA neurons respond not only to rewarding stimuli, but also to non-rewarding sensory events salient only by virtue of their novelty or intensity [Schultz 1998], as well as conditioned stimuli not associated with a reward [Bromberg-Martin 2010].
  • The phasic DA response is remarkably stereotyped (occurring with ~100ms latency, and a duration of ~100ms), across species, sensory modalities, numerous experimental paradigms, and largely independent of perceptual complexity of eliciting event [Redgrave 2011]. This highly stereotyped DA response time is incongruent with the reward-prediction error hypothesis when considering that there can be a marked difference in the time taken to establish the reward value of different stimuli.
  • The latency of gaze-shifts is in the range of 150-250ms [Jay 1987], and the phasic DA response very reliably occurs around ~100ms [Schultz 1998], this means that the reward-prediction error must be calculated before the animal has foveated on the stimulus. Additionally, the source of visual information driving the DA neurons is largely, if not exclusively, the superior colliculus. Neurons in the superior colliculus are highly sensitive to the location of luminance changes, but largely nonresponsive to color and geometric configurations, meaning that the superior colliculus is not in a position to provide object identity (and reward) information to DA neurons.

To contend with these last two points, a number of experiments have shown that DA neurons have shown responses of differing magnitudes and probabilities to unpredicted complex visual stimuli. However, throughout all of the experiments conducted, the different visual stimuli were presented consistently at the same location, which is exactly the visual feature that superior colliculus is capable of detecting [Redgrave 2006]. Rather than discriminating between complex visual stimuli features, the location of the stimuli is instead being used to determine the reward value of the stimulus. Outside of the experimental paradigm, however, temporally unpredictable events are also spatially unpredictable, which makes it unlikely that determining reward value by spatial location in natural environments would be a useful mechanism.

Taken all together, a pretty strong case against the reward-prediction error hypothesis is presented.

An alternative implementation of instrumental conditioning
In the paper series, Redgrave et al propose that instrumental (again, also known as operant) conditioning arises as a function of two mechanisms in the brain: 1) A mechanism to determine whether or not an unpredicted sensory stimuli was caused by the system (agency), establishing a cause-effect relationship if one exists, and 2) a mechanism for reward to modulate the afferent input to the striatum. As mentioned in the last post, the basal ganglia is a proposed central selection device, choosing actions based on the saliency of their input.

Phasic DA for agency determination / novel movement identification
Instead of the reward-prediction error being determined by the phasic DA signal, it is proposed to function for a much more basic purpose: Identifying the cause of unpredicted sensory stimuli. This is a prerequisite to instrumental conditioning / any adaptive behavior. This proposal is based on identifying another function that would generate behavior very similar to a reward-prediction error, while also considering the precise and highly stereotyped natures of the response (~100ms latency, ~100ms duration), and the other information that is likely to be in the striatum at the point when the phasic DA signal arrives. According to [Redgrave 2006] and [Redgrave 2008], there are at least three additional signals in the striatum at the time of phasic DA release:

  1. Sensory: from branching projections of the superior colliculus, providing information on the stimulus that elicited the phasic DA response
  2. Contextual information (i.e. general sensory, metabolic, cognitive state, and physical location): from any number of cortical, limbic, and subcortical projections into the striatum, and
  3. Motor-copy: signals sent from cortical and subcortical sensorimotor structures to the brainstem and spinal cord provide efference copies of the outgoing motor command through branching projections that are relayed both directly and indirectly (through the thalamus) to the dorsal striatum.

Note that these signals would also be in the basal ganglia and likely used in the same way in the proposal that phasic DA is for reward-prediction error as well. However, the list of problems presented above suggests strongly that stimulus and reward-value identification do not operate through the short-latency phasic DA signal. Agency detection is the alternative proposal for a learning based function that requires highly precise timing information, and does not rely on unavailable information such as object identity and actual stimulus reward-value.

One of the main problems with identifying the cause of unexpected stimuli is sorting through the irrelevant information to arrive at the specific trigger. The idea for the phasic DA signal to overcome this computational problem is that it ‘tags’ the signals in the dorsal striatum, including the motor-copy, when an unexpected stimulus occurs, making those actions to be more likely to be chosen again in a similar contextual situation. The authors also note that this process would be aided by the short-latency nature of the phasic DA signal, such that behavior evoked by the stimuli doesn’t get included in the signal tagging, confounding the event-outcome identification.
Through noisy exploration trying to make the event recur, signals that are consistently present become reinforced further, weeding through those that aren’t required to elicit the unexpected stimulus. Eventually the signal which accurately predicts the stimulus is identified. If it is a movement, then it gets added to the ‘library’ of motor actions, increasing the animals repertoire of predictable action / outcomes; in this way the phasic DA signal acts to determine agency and identify new movements. If the signal is not a movement, the association with a reward is noted, stored, and life carries on.

Prediction of sensory stimuli
As mentioned above, DA neurons respond to novel sensory stimuli. Interestingly, the novelty response of DA neurons habituates rapidly when a sensory stimulus fails to associate with a reward. Although much is known about the variables that influence habituation in primitive or reduced preparations, relatively little is known about the mechanisms behind the habituation of un-reinforced sensory stimuli [Redgrave 2011]. It could be a default property of the early sensory networks when a stimulus is repeatedly applied in the absence of any reinforcement, or the result of an outside network modulating afferent projections to sensory systems.

However, when a stimulus is associated with reward, early sensory systems sensitize to its presentation. Additionally, the phasic DA response shifts back to occur at the time of the conditioned stimulus, rather than at the time of the reward. This response continues to push backwards to the first predictable event in a chain of events leading to a reward, seemingly in conflict with the sensitization of stimuli associated with a reward. Additionally, if the conditioned stimulus occurs and the reward does not follow, at the time of the expected reward there is a dip in the tonic DA level.
The mechanism responsible is a precisely timed inhibitory signal that acts to cancel out the phasic DA response evoked by predicted rewards [Schultz 1998]. As stimuli are recognized as predictors of future sensory events, this timed inhibitory signal prevents the activation of the DA neurons. In this way only the first, unpredicted, appearance of a CS in a chain of events evokes a phasic DA response. The goal of this response is to try to learn the cause of this stimulus, in the event that no predictor is learned, the predictor first in the chain will continue to evoke a phasic DA response.

The source of this precisely timed inhibitory signal has not been identified experimentally, but there are several candidates identified in [Redgrave 2011]: direct inhibitory inputs from within the basal ganglia (striatum or globus pallidus); indirect inhibitory inputs from the habenulu-rostro-medial tegmental system (hRMTg); or phasic afferent excitation of local inhibitory neurons with connections to nearby DA neurons.

A side note from me. The cerebellum is widely regarded as a supervised learning center for the brain (so widely I won’t even provide supporting references!). With its highly stereotyped repeated neural structure, and the insane amount of neurons it houses (accounting for 10% of the volume of the brain but holding over 50% of its neurons!), it is thought to provide this supervised learning functionality for a number of different neural systems. The prediction of sensory events given a conditioned stimulus or efference copy of a motor command is a very basic supervised learning problem. The hRMTg system has, in its wide list of afferent projections, connections with deep cerebellar neurons [Jhou 2011]. Although the cerebellum wouldn’t necessarily be required, it also has access to all the contextual, sensory, and motor copy information sent to the basal ganglia, and the connections to the hRMTg system suggest it to me as a favorite among the possibilities listed.

Response to noxious events
Another highly valuable feature for a system is to flag any actions which led to a noxious, such as a painful response, and prevent those actions from being executed again. It would be expected, then, that DA activity is suppressed whenever noxious stimiuli are encountered. Indeed this is the case [Redgrave 2006], where phasic suppression of DA activity lasts for the duration of the noxious event. The mechanism believed responsible for this effect are specialized, high-threshold nociceptors, which are sensory receptors that responds to potentially damaging stimuli with direct projections to the spinal cord and brain. In the same way that phasic DA release potentiates connection strengths, phasic DA suppression depresses the weighting of these connections, making them less likely to be chosen again in the future when a similar situation arises.
To be clear, this response is only expected from stimuli that are directly perceived by the nociceptors to be noxious, such a phasic DA suppression is not expected in the case that a stimuli is noxious but higher level processing is required to determine its reward-value.

Reward maximization;
So we’ve established a likely function for the phasic DA signal, the identification of agency. There’s more to instrumental conditioning, however. There also needs to be a means of reward maximization. The details are light on this part of the model, but are based on the observation of computational models that afferent sensory structures projecting into the basal ganglia could also demonstrate reward-based modulation. This is would give rise to the reward-based action selection bias that is the crux of formal reinforcement learning. This figure is lifted from [Redgrave 2011]:

In this figure the proposed system is shown operating in response to intrinsic (to the basal ganglia) reinforcement on the left, in (A). This case arises when unexpected stimuli bias the action selection process of the basal ganglia to attempt to discover the cause of this stimulus, causing a ‘repetition bias’. On the right of the figure, in (B), the system is shown responding to extrinsic reinforcement, where higher level cortical processing centers have determined that a stimulus was rewarding, and the strength of projections into the basal ganglia are weighted to make the responsible action more likely to be repeated.

As the authors admit, how the reward maximization on the afferent projections to the basal ganglia could occur is still very much unknown. Additionally, as previously mentioned, the mechanisms through which non-reward associated novel stimuli habituate and reward associated novel stimuli sensitize remain to be determined. But the reward maximization proposal is definitely of secondary concern in these papers, the main issue being the reconsideration of the function of the phasic DA signal, for which a case was very strongly presented.

Overview of proposed model;
To put this all together, the system model works as follows. An event occurs that causes activity in an early sensory processing system, which activates the DA neurons. The DA neurons cause a biasing of action selection towards the actions in the dorsal striatum at that moment (which are the actions just taken), which potentially caused the novel, or unpredicted, sensory stimulus. Some other system now says ‘hey that was a rewarding stimuli, don’t habituate to it early sensory systems!’, preventing habituation in the early sensory system. The stimulus then continues to drive the DA neurons, tagging the signals that are in the striatum at that time. As this happens, the signals in the striatum will vary through noise on during action selection, which helps exploration to try to pin down what causes this unexpected (and now defined to be) rewarding sensory stimulus. So far all the biasing of action choice is taking place inside the striatum. When the phasic DA release has pinned down the signals that elicit this sensory stimulus, there’s a transference of this signal to the cortex. In the cortex the reward-maximization system can now bias this action that was figured out in the basal ganglia such that it’s weighted more heavily outside the striatum. Once this is done, the inhibitory predictive system can now learn the association between this signal occurring and a reward following, and a precisely timed inhibitory spike can be generated and sent to the DA neurons to prevent a dopaminergic release.

The last part about the inhibitory predictive system kicking in after transference to the cortex wasn’t explicitly stated in any of the papers, but that’s my understanding of this model.

Questions / Comments
Here are some questions that have come up as I’ve been reading through these papers.

– As mentioned in the previous post, the basal ganglia is proposed to be the central selection device for the brain. This means that the different command systems vying for control are constantly projecting in saliency signals, which makes me wonder how does BG make decisions for the upcoming moment if bombarded by efferent copies of motor commands? I remember reading previously in articles with other models of the basal ganglia a functional actor/critic separation of the dorsal/ventral striatum. Would some separation of saliency and information signals help? Or could it have something to do with dual population coding, which the authors previously mentioned as a means of conveying saliency. Perhaps the information is transmitted and the saliency is chosen from the norm of the vector of firing rates inside the striatum? This second option seems likely to introduce some timing issues.

– There are several promising models which operate based on reward maximization in happening first inside the basal ganglia, then being transferred out to the cortex [Ashby 2007], would the above separation into actor/critic dorsal/ventral striatum help realize this? With novelty detection in the dorsal side, receiving projections from the DA neurons, and reward maximization on the ventral side? Then upon consolidation of a “good” set of movements or action plans transference to the cortex? I am interested to investigate this.

Lots to think about!


[Ashby 2007] – A neurobiological theory of automaticity in perceptual categorization
[Bromberg-Martin 2010] – Dopamine in Motivational Control: Rewarding, Aversive, and Alerting
[Jay 1987] – Sensorimotor integration in the primate superior colliculus. I. Motor convergence
[Jhou 2011] – The mesopontine rostromedial tegmental nucleus: a structure targeted by the lateral habenula that projects to the ventral tegmental area of Tsai and substantia nigra compacta
[Matsumoto 2009] – Two types of dopamine neuron distinctly convey positive and negative motivational signals
[Redgrave 2006] – The short-latency dopamine signal: a role in discovering novel actions?
[Redgrave 2008] – What is reinforced by phasic dopamine signals?
[Redgrave 2011] – Functional properties of the basal ganglia’s re-entrant loop architecture: selection and reinforcement
[Schultz 1998] – Predictive Reward Signal of Dopamine Neurons
[Thorndike 1911] – Animal intelligence; experimental studies
Redgrave P, Vautrelle N, & Reynolds JN (2011). Functional properties of the basal ganglia’s re-entrant loop architecture: selection and reinforcement. Neuroscience, 198, 138-51 PMID: 21821101

Tagged ,

The basal ganglia for action selection

Peter Redgrave, Kevin Gurney, and John Reynolds have a series of papers out where they detail a basal ganglia model, looking at its physiology and potential functional role in the brain. They address a number of different points in their papers, and I’m going to write up a couple of posts in hopes of making the model / material more accessible and furthering my own understanding. I’ll also be adding in my own thoughts and questions as I go along, but I’ll try to keep explicit when ideas are coming from papers and when they’re coming from me. In this post I’m going to look at the basal ganglia’s proposed role as an action selection center.

Basal ganglia as an action selection center
In complex systems like the brain, there are numerous processes and sub-systems operating in parallel. Things like feeding, predator avoidance, mating, etc are all going to be suggesting a specific course of action for the body to follow (hereafter these different sub-systems will be referred to as ‘command systems’, keeping with terminology from the paper series). The problem arises in that there is only one body, and letting all the command systems have at controlling the body all at once is a poor idea for generating effective / efficient behavior. What is needed is a method of relegating control of the motor system to a single command system, and preventing signals from other command systems from interfering. This can be done by having all command systems put forward an ‘urgency’ (or saliency) level, and then using a winner-take-all (WTA) function to choose one to be in control.

In [Redgrave 1999], a set of possible solutions from engineering are presented in three WTA system architecture types: subsumption, distributed, and central selection.

Subsumption: In the subsumption architecture, the command systems have a priority ordering. In the event of a conflict, systems higher up on the priority list can override those lower than them to interrupt and suppress or replace outgoing commands. Although this allows quick response to environmental contingencies (such as the appearance of a predator, with the ‘evade predators’ command system given top priority), the prioritization is built in to the system, and as more command systems are added it becomes difficult to determine a proper prioritization. Additionally, due to the ordering of systems being built-in, the subsumption architecture displays far less flexibility than biological nervous systems.

Distributed: The distributed architecture is a popular choice for winner-take-all implementations, where each option is connected to all the others with an inhibitory connection. As the saliency of a given option increases, it inhibits the other options, which in turn reduces the inhibition they project back, until only one option is uninhibited. Here, selection is considered an ’emergent’ property of the network. This architecture also supports adaptation, as the weighting of the connections between options can be tuned, giving rise to complex dominance dynamics. However, there is a costly implementation. First, every option must be connected to every other option (resulting in n(n – 1) connections), and the connection weights properly balanced to give the desired prioritization. Second, to integrate a new option into the system another 2n connections must be added, and they must be properly balanced with the already existing connection weights. Third, the more options that are added to this system, the longer it takes to choose between them, especially if several options present saliency values very close to one another (this last point was added by me, and is not stated in the papers).

Central selection: In the central selection architecture, all of the command systems send their saliency values to a central switching device, which chooses one of them as the winner. In this case, the complexity of system connectivity is significantly reduced, to only 2n connections total (one from and to each command system), and to add a new system only requires 2 connections be added. Additionally, the case of tuning the connection weights from each system becomes significantly easier, as the dynamics that determine the winner are now explicitly based on the weighting of the only connection from each command system in to the central switching device.

Unsurprisingly, the central selection architecture is proposed to best model the structure of the brain for selecting between command systems (although the authors suggest each command system may implement a distributed selection architecture internally), and the basal ganglia is proposed as the central switching device. Supporting this, a computational model of the basal ganglia was presented in [Gurney 2001], and implemented in spiking neurons in [Stewart 2010], based on biological structure that very efficiently perform winner-take-all functionality. Interestingly, its architecture is such that it effectively chooses a winner quickly regardless of the number of competitors and its performance does not suffer from competitors presenting very similar saliency values [Stewart 2010].

Central selection constraints: The use of a central selection architecture also imposes several constraints: 1) the saliency of each competitor must be measured in some ‘common currency’, and 2) the output of the central switching device (the basal ganglia) must be set up such that it can activate the winning command system, and disable the losing ones.

For the common currency between command systems, the authors propose the use of dual population encoding [Koechlin 1996], which I’ll go into in another post more in detail, but basically says two things can be extracted from the firing pattern of a population of neurons: the first is the information being represented, and the second is the saliency of this information, determined as the norm of firing rates of the neurons.

To address the second constraint, we’ll first need to look briefly at the structure of the basal ganglia.

Basic structure of the basal ganglia
This is a very low-res diagram of the neurobiological structure of the basal ganglia, taken from [Gurney 2001]:
The principle input components of the basal ganglia are the striatum and the subthalamic nuclean (STN). These structures receive projections from pretty much the entire cerebral cortex, including the motor, sensory, association, and limbic areas. The main output components of the basal ganglia are the internal segment of the globus pallidus (GPi), and the substantia nigra pars reticulata and lateralis (SNr). The output of the basal ganglia projects then through the thalamus and back to the cortex. Notably, projections routed through the thalamus go to both the same sites that originated the basal ganglia input, as well as others, forming both closed and open loop systems [Joel 1994].

Parallel functional loops: There are two particular points of interest of the basal ganglia structure relevant to this discussion. The first is that there is an intrinsic separation of information from different brain regions as it travels through the basal ganglia, such that the basal ganglia can be viewed as having a number of different processing tracts that operate in parallel: limbic, associative, sensory, and motor. This is the set of closed loops mentioned above. Here is an illustration, taken from [Redgrave 2011]:
All of these loops have a highly similar structure, suggesting that each performs the same function on different information [Voorn 2004].

Tonically inhibitory output: The second point of interest addresses our second constraint mentioned above, of requiring some mechanism for enabling / disabling the output from a chosen command system to take control of the body: The output from the basal ganglia to the thalamus is tonically inhibitory. There have been several possible functional roles proposed for this tonic inhibition, both in the closed and open loop projections. I’ll discuss the closed loop case below. In the open loop projections, there seems to be a clear potential for a ‘gating’ mechanism, where the output from the winning system is disinhibited in the thalamus and allowed to pass forward. Extrapolating from this, I’ve made a very, very, very simplified diagram illustrating how open loop gating using tonic inhibition could work:
Here, the association area has a bunch of different command systems, labelled 1 through K, which all have their own ideas about what the motor control system should be doing. They each send out a branching projection, with the saliency values used by the basal ganglia, and the information carried into the thalamus. They all project to a part of the thalamus which routes the information to the motor system, but due to tonic inhibition from the basal ganglia, no information is passed through. Once the basal ganglia chooses a winner from the K command systems, however, that winner’s channel in the thalamus is disinhibited, and it can send it’s directions out to the motor system for execution. In this way, the basal ganglia has the ability to enable / disable output from a command system.

After discussion with a couple of the guys in my lab, a couple of benefits of using tonic inhibition over selective excitation as the output of the basal ganglia have come up.
The first is that the use of inhibition is a much simpler implementation of a gateway. When using inhibition, the connections from the basal ganglia fire if no information should pass through, and stop firing when it should. In the case of activation, however, there is necessarily some sort of multiplication operation being performed such that the output from the gateway is GATEWAY_VALUE * INPUT_VALUE. In addition to being more complicated that inhibition of undesired options, it’s inclined towards performance errors.
This is the second point, in that with tonic inhibition the basal ganglia stomps everything out. So nothing is accidentally passed through a gateway if a INPUT_VALUE becomes highly active. With selective activation, it’s foreseeable that high levels of INPUT_VALUE could mimic the activation levels of GATEWAY_VALUE * INPUT_VALUE. In these ways tonic inhibition makes a gateway functionality more efficient and effective.

Alternatively, the open loop gating could also function as my supervisory, Chris Eliasmith, comments below: The routing signal from the basal ganglia is projected through the thalamus out to modulate the cortico-cortico connections from the associative area to the motor cortex. Modifying the example diagram above to operate this way, we get:

In this case I drew out the different connections for clarity. The saliency values are projected to the basal ganglia, and a winner is chosen. The modulatory values projected through the thalamus then connect to the corticocortical connections from the associative area to the motor area, and set such that the winner is allowed to project into the motor area and the others are prevented. The benefit of performing gating this way is that the required bandwidth for information passing through the thalamus is significantly reduced.

Closed loop projections: The information in this subsection is not discussed in the paper series. The natural question following the discussion of the potential role of the open loops and tonic inhibition in the thalamus as a gating mechanism is what could the role of the closed loops be? The basal ganglia has been shown to play a strong role in motor learning and sequence learning. In [Stewart 2008], a spiking neuron model of the basal ganglia was developed that demonstrates how the recurrent connections with the cortex can be used to control the evolving dynamics of a population of neurons. In the paper a simple set of rules for counting are developed. In experiments on monkeys involving sequence learning, monkeys perform a similar type of learning figuring out how to appropriately move their arms to get the reward. If the basal ganglia is damaged, the monkeys are no longer able to learn new sequences, but can still perform previously learned sequences [Turner 2005].

[Ashby 2007] propose that information such as motor sequences can be learned in the basal ganglia, where very fine-grained mechanisms for identifying the timing and causal relationship between action and effect exit, and once learned, it can then be transferred to the cortex for more automatic execution. This is thought to be what has happened when monkeys are able to execute previously learned sequences, but not able to learn new sequences.

The use of tonically inhibitory output here is still unclear, but one possibility is that inside each command system there is a distributed network, containing each of the possible ‘next step’s for that command system. Inside the basal ganglia, one of these next steps is chosen, and it’s selection amounts to disinhibiting recurrent connections back to itself, allowing its saliency to increase to a point that all the other options are fully inhibited and the dynamics evolve according to the chosen next step.

Hierarchical selection of action
Now with this whole system in place, it is proposed that this structure serves to implement a hierarchy of action selection [Redgrave 1999]. In this hierarchy, the decision on how to next move would start out at a very abstract level, as a competition between some basic command systems arguing about how hungry, tired, horny, etc you are. Once it’s decided that you are more hungry than the others, the next level of the hierarchy is engaged to decide what your best option is: go to the store to get food, eat your canned beans, or order a pizza. This then continues on until you get to a level of deciding what muscles to move, all based on your goal of eating a can of beans. This of course is a gross simplification of any possible analogous process in the brain, but it hopefully gets the point across.

One of the major benefits of a hierarchical action selection setup is that decision making is simplified on the lower levels, because a large number of options are not in line with the decisions made at a higher level. For example, to the end of getting your can of beans, you probably don’t have to decide to not punch yourself in the face, because it doesn’t further you along your path to getting beans.

Things of course become even more complicated when you consider that is possible to be working towards to goals at the same time, in that it is possible for us to successfully walk and chew gum at the same time. But looking at that falls outside of the scope of this post.

In this post I’ve put forth the case presented in the paper series from Redgrave et al for the basal ganglia as an action selection center. Without a doubt there is much more experimental work that needs to be examined, but here I’ve focused on providing a brief overview of how the basal ganglia could be implementing action selection. In future posts on the subject, I’ll be looking at other issues addressed by the Redgrave paper series, in particular the role of the short-latency phasic dopamine signal in the basal ganglia. My goal is to work through these papers and then present an incorporation of this work into a larger model of the motor control system.


[Ashby 2007] – A neurobiological theory of automaticity in perceptual categorization
[Gurney 2001] – A computational model of action selection in the basal ganglia. I. A new functional anatomy
[Joel 1994] – The organization of the basal ganglia-thalamocortical circuits: open interconnected rather than closed segregated
[Koechlin 1996] – Dual Population Coding in the Neocortex: A Model of Interaction between Representation and Attention in the Visual Cortex
[Redgrave 1999] – The Basal Ganglia: A Vertebrate Solution To The Selection Problem?
[Redgrave 2011] – Functional properties of the basal ganglia’s re-entrant loop architecture: selection and reinforcement
[Stewart 2008] – Building production systems with realistic spiking neurons
[Stewart 2010] – Dynamic Behaviour of a Spiking Model of Action Selection in the Basal Ganglia
[Turner 2005] – Sequential Motor Behavior and the Basal Ganglia: Evidence from a serial reaction time task in monkeys
[Voorn 2004] – Putting a spin on the dorsal–ventral divide of the striatum
Redgrave P, Prescott TJ, & Gurney K (1999). The basal ganglia: a vertebrate solution to the selection problem? Neuroscience, 89 (4), 1009-23 PMID: 10362291

Tagged ,