Tag Archives: neuroscience

Building a spiking neural model of adaptive arm control

About a year ago I published the work from my thesis in a paper called ‘A spiking neural model of adaptive arm control’. In this paper I presented the Recurrent Error-driven Adaptive Control Hierarchy (REACH) model. The goal of the model is to begin working towards reproducing behavioural level phenomena of human movement with biologically plausible spiking neural networks.

To do this, I start by using three methods from control literature (operational space control, dynamic movement primitives, and non-linear adaptive control) to create an algorithms level model of the motor control system that captures behavioural level phenomena of human movement. Then I explore how this functionality could be mapped to the primate brain and implemented in spiking neurons. Finally, I look at the data generated by this model on the behavioural level (e.g. kinematics of movement), the systems level (i.e. analysis of populations of neurons), and the single-cell level (e.g. correlating neural activity with movement parameters) and compare/contrast with experimental data.

By having a full model framework (from observable behaviour to neural spikes) is to have a more constrained computational model of the motor control system; adding lower-level biological constraints to behavioural models and higher-level behavioural constraints to neural models.

In general, once you have a model, the critical next step is to generating testable predictions that can be used to discriminate between other models with different implementations or underlying algorithms. Discriminative predictions allow us to design experiments that can gather evidence in favour or against different hypotheses of brain function, and provide clues to useful directions for further research. Which is the real contribution of computational modeling.

So that’s a quick overview of the paper; there are quite a few pages of supplementary information that describe the details of the model implementation, and I provided the code and data used to generate the data analysis figures. However, code to explicitly run the model on your own has been missing. As one of the major points of this blog is to provide code for furthering research, this is pretty embarrassing. So, to begin to remedy this, in this post I’m going to work through a REACH framework for building models to control a two-link arm through reaching in a line, tracing a circle, performing the centre-out reaching task, and adapting online to unexpected perturbations during reaching imposed by a joint-velocity based forcefield.

This post is directed towards those who have already read the paper (although not necessarily the supplementary material). To run the scripts you’ll need Nengo, Nengo GUI, and NengoLib all installed. There’s a description of the theory behind the Neural Engineering Framework, which I use extensively in my Nengo modeling, in the paper. I’m hoping that between that and code readability / my explanations below that most will be comfortable starting to play around with the code. But if you’re not, and would like more resources, you can check out the Getting Started page on the Nengo website, the tutorials from How To Build a Brain, and the examples in the Nengo GUI.

You can find all of the code up on my GitHub.

NOTE: I’m using the two-link arm (which is fully implemented in Python) instead of the three-link arm (which has compile issues for Macs) both so that everyone should be able to run the model arm code and to reduce the number of neurons that are required for control, so that hopefully you can run this on you laptop in the event that you don’t have a super power Ubuntu work station. Scaling this model up to the three-link arm is straight-forward though, and I will work on getting code up (for the three-link arm for non-Mac users) as a next project.

Implementing PMC – the trajectory generation system

I’ve talked at length about dynamic movement primitives (DMPs) in previous posts, so I won’t describe those again here. Instead I will focus on their implementation in neurons.

def generate(y_des, speed=1, alpha=1000.0):
    beta = alpha / 4.0

    # generate the forcing function
    forces, _, goals = forcing_functions.generate(
        y_des=y_des, rhythmic=False, alpha=alpha, beta=beta)

    # create alpha synapse, which has point attractor dynamics
    tau = np.sqrt(1.0 / (alpha*beta))
    alpha_synapse = nengolib.Alpha(tau)

    net = nengo.Network('PMC')
    with net:
        net.output = nengo.Node(size_in=2)

        # create a start / stop movement signal
        time_func = lambda t: min(max(
            (t * speed) % 4.5 - 2.5, -1), 1)

        def goal_func(t):
            t = time_func(t)
            if t <= -1:
                return goals[0]
            return goals[1]
        net.goal = nengo.Node(output=goal_func, label='goal')

        # -------------------- Ramp ---------------------------
        ramp_node = nengo.Node(output=time_func, label='ramp')
        net.ramp = nengo.Ensemble(
            n_neurons=500, dimensions=1, label='ramp ens')
        nengo.Connection(ramp_node, net.ramp)

        # ------------------- Forcing Functions ---------------
        def relay_func(t, x):
            t = time_func(t)
            if t <= -1:
                return [0, 0]
            return x
        # the relay prevents forces from being sent on reset
        relay = nengo.Node(output=relay_func, size_in=2)

        domain = np.linspace(-1, 1, len(forces[0]))
        x_func = interpolate.interp1d(domain, forces[0])
        y_func = interpolate.interp1d(domain, forces[1])
        nengo.Connection(net.ramp, relay[0], transform=transform,
                         function=x_func, synapse=alpha_synapse)
        nengo.Connection(net.ramp, relay[1], transform=transform,
                         function=y_func, synapse=alpha_synapse)
        nengo.Connection(relay, net.output)

        nengo.Connection(net.goal[0], net.output[0],
        nengo.Connection(net.goal[1], net.output[1],
    return net

The generate method for the PMC takes in a desired trajectory, y_des, as a parameter. The first thing we do (on lines 5-6) is calculate the forcing function that will push the DMP point attractor along the desired trajectory.

The next thing (on lines 9-10) is creating an Alpha (second-order low-pass filter) synapse. By writing out the dynamics of a point attractor in Laplace space, one of the lab members, Aaron Voelker, noticed that the dynamics could be fully implemented by creating an Alpha synapse with an appropriate choice of tau. I walk through all of the math behind this in this post. Here we’ll use that more compact method and project directly to the output node, which improves performance and reduces the number of neurons.

Inside the PMC network we create a time_func node, which is the pace-setter during simulation. It will output a linear ramp from -1 to 1 every few seconds, with the pace set by the speed parameter, and then pause before restarting.

We also have a goal node, which will provide a target starting and ending point for the trajectory. Both the time_func and goal nodes are created and used as a model simulation convenience, and proposed to be generated elsewhere in the brain (the basal ganglia, why not? #igotreasons #provemewrong).

The ramp ensemble is the most important component of the trajectory generation system. It takes the output from the time_func node as input, and generates the forcing function which will guide our little system through the trajectory that was passed in. The ensemble itself is nothing special, but rather the function that it approximates on its outgoing connection. We set up this function approximation with the following code:

        domain = np.linspace(-1, 1, len(forces[0]))
        x_func = interpolate.interp1d(domain, forces[0])
        y_func = interpolate.interp1d(domain, forces[1])
        nengo.Connection(net.ramp, relay[0], transform=transform,
                         function=x_func, synapse=alpha_synapse)
        nengo.Connection(net.ramp, relay[1], transform=transform,
                         function=y_func, synapse=alpha_synapse)
        nengo.Connection(relay, net.output)

We want the forcing function be generated as the signals represented in the ramp ensemble moves from -1 to 1. To achieve this, we create interpolation functions, x_func and y_func, which are set up to generate the forcing function values mapped to input values between -1 and 1. We pass these functions into the outgoing connections from the ramp population (one for x and one for y). So now when the ramp ensemble is representing -1, 0, and 1 the output along the two connections will be the starting, middle, and ending x and y points of the forcing function trajectory. The transform and synapse are set on each connection with the appropriate gain values and Alpha synapse, respectively, to implement point attractor dynamics.

NOTE: The above DMP implementation can generate a trajectory signal with as many dimensions as you would like, and all that’s required is adding another outgoing Connection projecting from the ramp ensemble.

The last thing in the code is hooking up the goal node to the output, which completes the point attractor implementation.

Implementing M1 – the kinematics of operational space control

In REACH, we’ve modelled the primary motor cortex (M1) as responsible for taking in a desired hand movement (i.e. target_position - current_hand_position) and calculating a set of joint torques to carry that out. Explicitly, M1 generates the kinematics component of an OSC signal:

\textbf{u}_\textrm{M1} = \textbf{J}^T \textbf{M}_\textbf{x} (k_p (\textbf{x}^* - \textbf{x}))

In the paper I did this using several populations of neurons, one to calculate the Jacobian, and then an EnsembleArray to perform the multiplication for the dot product of each dimension separately. Since then I’ve had the chance to rework things and it’s now done entirely in one ensemble.

Now, the set up for the M1 model that computes the above function is to have a single ensemble of neurons that takes in the joint angles, \textbf{q}, and control signal \textbf{u}_\textbf{x} = k_p (\textbf{x}^* - \textbf{x}), and outputs the function above. Sounds pretty simple, right? Simple is good.

Let’s look at the code (where I’ve stripped out the comments and some debugging code):

def generate(arm, kp=1, operational_space=True,
             inertia_compensation=True, means=None, scales=None):

    dim = arm.DOF + 2

    means = np.zeros(dim) if means is None else means
    scales = np.ones(dim) if scales is None else scales
    scale_down, scale_up = generate_scaling_functions(
        np.asarray(means), np.asarray(scales))

    net = nengo.Network('M1')
    with net:
        # create / connect up M1 ------------------------------
        net.M1 = nengo.Ensemble(
            n_neurons=1000, dimensions=dim,
                base=nengo.dists.Uniform(-1, .1)))

        # expecting input in form [q, x_des]
        net.input = nengo.Node(output=scale_down, size_in=dim)
        net.output = nengo.Node(size_in=arm.DOF)

        def M1_func(x, operational_space, inertia_compensation):
            """ calculate the kinematics of the OSC signal """
            x = scale_up(x)
            q = x[:arm.DOF]
            x_des = kp * x[arm.DOF:]

            # calculate hand (dx, dy)
            if operational_space:
                J = arm.J(q=q)

                if inertia_compensation:
                    # account for inertia
                    Mx = arm.Mx(q=q)
                    x_des = np.dot(Mx, x_des)
                # transform to joint torques
                u = np.dot(J.T, x_des)
                u = x_des

                if inertia_compensation:
                    # account for mass
                    M = arm.M(q=q)
                    u = np.dot(M, x_des)

            return u

        # send in system feedback and target information
        # don't account for synapses twice
        nengo.Connection(net.input, net.M1, synapse=None)
            net.M1, net.output,
            function=lambda x: M1_func(
                x, operational_space, inertia_compensation),

    return net

The ensemble of neurons itself is created with a few interesting parameters:

        net.M1 = nengo.Ensemble(
            n_neurons=1000, dimensions=dim,
                dimensions=dim, base=nengo.dists.Uniform(-1, .1)))

Specifically, the radius and intercepts parameters.

Setting the intercepts

First we’ll discuss setting the intercepts using the AreaIntercepts distribution. The intercepts of a neuron determine how much of state space a neuron is active in, which we’ll refer to as the ‘active proportion’. If you don’t know what kind of functions you want to approximate with your neurons, then you having the active proportions for your ensemble chosen from a uniform distribution is a good starting point. This means that you’ll have roughly the same number of neurons active across all of state space as you do neurons that are active over half of state space as you do neurons that are active over very small portions of state space.

By default, Nengo sets the intercepts such that the distribution of active proportions is uniform for lower dimensional spaces. But when you start moving into higher dimensional spaces (into a hypersphere) the default method breaks down and you get mostly neurons that are either active for all of state space or almost none of state space. The AreaIntercepts class calculates how the intercepts should be set to achieve the desire active proportion inside a hypersphere. There are a lot more details here that you can read up on in this IPython notebook by Dr. Terrence C. Stewart.

What you need to know right now is that we’re setting the intercepts of the neurons such that the percentage of state space for which any given neuron is active is chosen from a uniform distribution between 0% and 55%. In other words, a neuron will maximally be active in 55% of state space, no more. This will let us model more nonlinear functions (such as the kinematics of the OSC signal) with fewer neurons. If this description is clear as mud, I really recommend checking out the IPython notebook linked above to get an intuition for what I’m talking about.

Scaling the input signal

The other parameter we set on the M1 ensemble is the radius. The radius scales the range of input values that the ensemble can represent, which is by default anything inside the unit hypersphere (i.e. hypersphere with radius=1). If the radius is left at this default value, the neural activity will saturate for vectors with magnitude greater than 1, leading to inaccurate vector representation and function approximation for input vectors with magnitude > 1. For lower dimensions this isn’t a terrible problem, but as the dimensions of the state space you’re representing grow it becomes more common for input vectors to have a norm greater than 1. Typically, we’d like to be able to, at a minimum, represent vectors with any number of dimensions where any element can be anywhere between -1 and 1. To do this, all we have to do is calculate the norm of the unit vector size dim, which is np.sqrt(dim) (the magnitude of a vector size dim with all elements set to one).

Now that we’re able to represent vectors where the input values are all between -1 and 1, the last part of this sub-network is scaling the input to be between -1 and 1. We use two scaling functions, scale_down and scale_up. The scale_down function subtracts a mean value and scales the input signal to be between -1 and 1. The scale_up function reverts the vector back to it’s original values so that calculations can be carried out normally on the decoding. In choosing mean and scaling values, there are two ways we can set these functions up:

  1. Set them generally, based on the upper and lower bounds of the input signal. For M1, the input is [\textbf{q}, \textbf{u}_\textbf{x}] where \textbf{u}_\textbf{x} is the control signal in end-effector space, we know that the joint angles are always in the range 0 to \pi (because that’s how the arm simulation is programmed), so we can set the means and scales to be \frac{\pi}{2} for \textbf{q}. For \textbf{u} a mean of zero is reasonable, and we can choose (arbitrarily, empirically, or analytically) the largest task space control signal we want to represent accurately.
  2. Or, if we know the model will be performing a specific task, we can look at the range of input values encountered during that task and set the means and scales terms appropriately. For the task of reaching in a straight line, the arm moves in a very limited state space and we can set the mean and we can tune these parameter to be very specific:
                                 means=[0.6, 2.2, 0, 0],
                                 scales=[.25, .25, .25, .25]

The benefit of the second method, although one can argue it’s parameter tuning and makes things less biologically plausible, is that it lets us run simulations with fewer neurons. The first method works for all of state space, given enough neurons, but seeing as we don’t always want to be simulating 100k+ neurons we’re using the second method here. By tuning the scaling functions more specifically we’re able to run our model using 1k neurons (and could probably get away with fewer). It’s important to keep in mind though that if the arm moves outside the expected range the control will become unstable.

Implementing CB – the dynamics of operational space control

The cerebellum (CB) sub-network has two components to it: dynamics compensation and dynamics adaptation. First we’ll discuss the dynamics compensation. By which I mean the k_v \textbf{M} (\textbf{q}^* - \textbf{q}) term from the OSC signal.

Much like the calculating the kinematics term of the OSC signal in M1, we can calculate the entire dynamics compensation term using a single ensemble with an appropriate radius, scaled inputs, and well chosen intercepts.

def generate(arm, kv=1, learning_rate=None, learned_weights=None,
             means=None, scales=None):
    dim = arm.DOF * 2

    means = np.zeros(dim) if means is None else means
    scales = np.ones(dim) if scales is None else scales
    scale_down, scale_up = generate_scaling_functions(
        np.asarray(means), np.asarray(scales))

    net = nengo.Network('CB')
    with net:
        # create / connect up CB --------------------------------
        net.CB = nengo.Ensemble(
            n_neurons=1000, dimensions=dim,
                base=nengo.dists.Uniform(-1, .1)))
        # expecting input in form [q, dq, u]
        net.input = nengo.Node(output=scale_down,
        cb_input = nengo.Node(size_in=dim, label='CB input')
        nengo.Connection(net.input[:dim], cb_input)
        # output is [-Mdq, u_adapt]
        net.output = nengo.Node(size_in=arm.DOF*2)

        def CB_func(x):
            """ calculate the dynamic component of OSC signal """
            x = scale_up(x)
            q = x[:arm.DOF]
            dq = x[arm.DOF:arm.DOF*2]

            # calculate inertia matrix
            M = arm.M(q=q)
            return -np.dot(M, kv * dq)
        # connect up the input and output
        nengo.Connection(net.input[:dim], net.CB)
        nengo.Connection(net.CB, net.output[:arm.DOF],
                         function=CB_func, synapse=None)

I don’t think there’s anything noteworthy going on here, most of the relevant details have already been discussed…so we’ll move on to the adaptation!

Implementing CB – non-linear dynamics adaptation

The final part of the model is the non-linear dynamics adaptation, modelled as a separate ensemble in the cerebellar sub-network (a separate ensemble so that it’s more modular, the learning connection could also come off of the other CB population). I work through the details and proof of the learning rule in the paper, so I’m not going to discuss that here. But I will restate the learning rule here:

\dot{\textbf{d}} = \textbf{L}_d \textbf{A} \otimes \textbf{u},

where \textbf{d} are the neural decoders, \textbf{L}_d is the learning rate, \textbf{A} is the neural activity of the ensemble, and \textbf{u} is the joint space control signal sent to the arm. This is a basic delta learning rule, where the decoders of the active neurons are modified to push the decoded function in a direction that reduces the error.

The adaptive ensemble can be initialized either using saved weights (passed in with the learned_weights paramater) or as all zeros. It is important to note that setting decoders to all zeros means that does not mean having zero neural activity, so learning will not be affected by this initialization.

        # dynamics adaptation------------------------------------
        if learning_rate is not None:
            net.CB_adapt = nengo.Ensemble(
                n_neurons=1000, dimensions=arm.DOF*2,
                # enforce spiking neurons
                    base=nengo.dists.Uniform(-.5, .2)))

            net.learn_encoders = nengo.Connection(
                net.input[:arm.DOF*2], net.CB_adapt,)

            # if no saved weights were passed in start from zero
            weights = (
                learned_weights if learned_weights is not None
                else np.zeros((arm.DOF, net.CB_adapt.n_neurons)))
            net.learn_conn = nengo.Connection(
                # connect directly to arm so that adaptive signal
                # is not included in the training signal
                net.CB_adapt.neurons, net.output[arm.DOF:],

                             transform=-1, synapse=.01)
    return net

We’re able to implement that learning rule using Nengo’s prescribed error-sensitivity (PES) learning on our connection from the adaptive ensemble to the output. With this set up the system will be able to learn to adapt to perturbations that are functions of the input (set here to be [\textbf{q}, \dot{\textbf{q}}]).

The intercepts in this population are set to values I found worked well for adapting to a few different forces, but it’s definitely a parameter to play with in your own scripts if you’re finding that there’s too much or not enough generalization of the decoded function output across the state space.

One other thing to mention is that we need to have a relay node to amalgamate the control signals output from M1 and the dynamics compensation ensemble in the CB. This signal is used to train the adaptive ensemble, and it’s important that the adaptive ensemble’s output is not included in the training signal, or else the system quickly goes off to positive or negative infinity.

Implementing S1 – a placeholder

The last sub-network in the REACH model is a placeholder for a primary sensory cortex (S1) model. This is just a set of ensembles that represents the feedback from the arm and relay it on to the rest of the model.

def generate(arm, direct_mode=False, means=None, scales=None):
    dim = arm.DOF*2 + 2  # represents [q, dq, hand_xy]

    means = np.zeros(dim) if means is None else means
    scales = np.ones(dim) if scales is None else scales
    scale_down, scale_up = generate_scaling_functions(
        np.asarray(means), np.asarray(scales))

    net = nengo.Network('S1')
    with net:
        # create / connect up S1 --------------------------------
        net.S1 = nengo.networks.EnsembleArray(
            n_neurons=50, n_ensembles=dim)

        # expecting input in form [q, x_des]
        net.input = nengo.Node(output=scale_down, size_in=dim)
        net.output = nengo.Node(
            lambda t, x: scale_up(x), size_in=dim)

        # send in system feedback and target information
        # don't account for synapses twice
        nengo.Connection(net.input, net.S1.input, synapse=None)
        nengo.Connection(net.S1.output, net.output, synapse=None)

    return net

Since there’s no function that we’re decoding off of the represented variables we can use separate ensembles to represent each dimension with an EnsembleArray. If we were going to decode some function of, for example, q0 and dq0, then we would need an ensemble that represents both variables. But since we’re just decoding out f(x) = x, using an EnsembleArray is a convenient way to decrease the number of neurons needed to accurately represent the input.

Creating a model using the framework

The REACH model has been set up to be as much of a plug and play system as possible. To generate a model you first create the M1, PMC, CB, and S1 networks, and then they’re all hooked up to each other using the framework.py file. Here’s an example script that controls the arm to trace a circle:

def generate():
    kp = 200
    kv = np.sqrt(kp) * 1.5

    center = np.array([0, 1.25])
    arm_sim = arm.Arm2Link(dt=1e-3)
    # set the initial position of the arm
    arm_sim.init_q = arm_sim.inv_kinematics(center)

    net = nengo.Network(seed=0)
    with net:
        net.dim = arm_sim.DOF
        net.arm_node = arm_sim.create_nengo_node()
        net.error = nengo.Ensemble(1000, 2)
        net.xy = nengo.Node(size_in=2)

        # create an M1 model -------------------------------------
        net.M1 = M1.generate(arm_sim, kp=kp,
                             means=[0.6, 2.2, 0, 0],
                             scales=[.5, .5, .25, .25])

        # create an S1 model -------------------------------------
        net.S1 = S1.generate(arm_sim,
                             means=[.6, 2.2, -.5, 0, 0, 1.25],
                             scales=[.5, .5, 1.7, 1.5, .75, .75])

        # subtract current position to get task space direction
        nengo.Connection(net.S1.output[net.dim*2:], net.error,

        # create a trajectory for the hand to follow -------------
        x = np.linspace(0.0, 2.0*np.pi, 100)
        PMC_trajectory = np.vstack([np.cos(x) * .5,
                                    np.sin(x) * .5])
        PMC_trajectory += center[:, None]
        # create / connect up PMC --------------------------------
        net.PMC = PMC.generate(PMC_trajectory, speed=1)
        # send target for calculating control signal
        nengo.Connection(net.PMC.output, net.error)
        # send target (x,y) for plotting
        nengo.Connection(net.PMC.output, net.xy)

        net.CB = CB.generate(arm_sim, kv=kv,
                             means=[0.6, 2.2, -.5, 0],
                             scales=[.5, .5, 1.6, 1.5])

    model = framework.generate(net=net, probes_on=True)
    return model

In line 50 you can see the call to the framework code, which will hook up the most common connections that don’t vary between the different scripts.

The REACH model has assigned functionality to each area / sub-network, and you can see the expected input / output in the comments at the top of each sub-network file, but the implementations are open. You can create your own M1, PMC, CB, or S1 sub-networks and try them out in the context of a full model that generates high-level movement behaviour.

Running the model

To run the model you’ll need Nengo, Nengo GUI, and NengoLib all installed. You can then pull open Nengo GUI and load any of the a# scripts. In all of these scripts the S1 model is just an ensemble that represents the output from the arm_node. Here’s what each of the scripts does:

  1. a01 has a spiking M1 and CB, dynamics adaptation turned off. The model guides the arm in reaching in a straight line to a single target and back.
  2. a02 has a spiking M1, PMC, and CB, dynamics adaptation turned off. The PMC generates a path for the hand to follow that traces out a circle.
  3. a03 has a spiking M1, PMC, and CB, dynamics adaptation turned off. The PMC generates a path for the joints to follow, which moves the hand in a straight line to a target and back.
  4. a04 has a spiking M1 and CB, dynamics adaptation turned off. The model performs the centre-out reaching task, starting at a central point and reaching to 8 points around a circle.
  5. a05 has a spiking M1 and CB, and dynamics adaptation turned on. The model performs the centre-out reaching task, starting at a central point and reaching to 8 points around a circle. As the model reaches, a forcefield is applied based on the joint velocities that pushes the arm as it tries to reach the target. After 100-150 seconds of simulation the arm has adapted and learned to reach in a straight line again.

Here’s what it looks like when you pull open a02 in Nengo GUI:


I’m not going to win any awards for arm animation, but! It’s still a useful visualization, and if anyone is proficient in javascript and want’s to improve it, please do! You can see the network architecture in the top left, the spikes generated by M1 and CB to the right of that, the arm in the bottom left, and the path traced out by the hand just to the right of that. On the top right you can see the a02 script code, and below that the Nengo console.


One of the most immediate extensions (aside from any sort of model of S1) that comes to mind here is implementing a more detailed cerebellar model, as there are several anatomically detailed models which have the same supervised learner functionality (for example (Yamazaki and Nagao, 2012)).

Hopefully people find this post and the code useful for their own work, or at least interesting! In the ideal scenario this would be a community project, where researchers add models of different brain areas and we end up with a large library of implementations to build larger models with in a Mr. Potato Head kind of fashion.

You can find all of the code up on my GitHub. And again, this code all should have been publicly available along with the publication. Hopefully the code proves useful! If you have any questions about it please don’t hesitate to make a comment here or contact me through email, and I’ll get back to you as soon as I can.

Tagged , , , , , ,

Workshop talk – Methods for scaling neural computation

A couple of months ago I gave a talk at the Neuro-Inspired Computational Elements (NICE) workshop, about the use of cortical microcircuits for adaptation in the prediction and control problems. The talk was recorded, and here is a link: http://1.usa.gov/1tg2n2I Ignore the fire alarm that gets pulled in the first minute, the actual talk starts around 1 minute.

The were a lot of talks in general that were very interesting, which are also available online here: http://digitalops.sandia.gov/Mediasite/Play/86e1cbfa747a448d96909658df1352011d

I’ll be writing up some posts on the subject matter of my talk in the next couple months, explaining the methods in more detail and providing some solid examples with code. Hope you find the video interesting!

Tagged , , , ,

Nengo model – Low pass derivative filter

To just get the code you can copy / paste from below or get the code from my github: low_pass_derivative_filter.py.

In the course of building models in Nengo, I recently came in to need for a neural implementation of a low pass derivative filter. I scripted up a sub-network in Nengo (www.nengo.ca) that does this, and until we get the model database / repository up and running for Nengo scripts I’ll keep working through building these things here, because it goes over some basic methods that can be useful when you’re building up your models.

Basically there are three parts to this model: Derivative calculation, absolute value calculation, and an inhibitory projection with a threshold activation that projects to the output population. Here’s a picture:
Here’s the idea: The population input projects both directly to the output population and to a population that calculates the derivative of the sum across dimensions of the input signal. The derivative population passes on the derivative of the input signal to an absolute value calculating population, which passes the absolute value of the derivative on to a population that isn’t activated for values under a threshold level. This threshold population then projects very strong inhibition to the output population, so that when the absolute value of the derivative is above the threshold level, no output is projected, and otherwise the system just relays the input signal straight through. Here’s the code:

def make_dlowpass(name, neurons, dimensions, radius=10, tau_inhib=0.005, inhib_scale=10):

    dlowpass = nef.Network(name)

    dlowpass.make('input', neurons=1, dimensions=dimensions, mode='direct') # create input relay
    output = dlowpass.make('output', neurons=dimensions*neurons, dimensions=dimensions) # create output relay
    # now we track the derivative of sum, and only let output relay the input
    # if the derivative is below a given threshold
    dlowpass.make('derivative', neurons=radius*neurons, dimensions=2, radius=radius) # create population to calculate the derivative
    dlowpass.connect('derivative', 'derivative', index_pre=0, index_post=1, pstc=0.1) # set up recurrent connection
    dlowpass.add(make_abs_val(name='abs_val', neurons=neurons, dimensions=1, intercept=(.2,1))) # create a subnetwork to calculate the absolute value  

    # connect it up!
    dlowpass.connect('input', 'output') # set up communication channel
    dlowpass.connect('input', 'derivative', index_post=0)
    def sub(x):
        return [x[0] - x[1]]
    dlowpass.connect('derivative', 'abs_val.input', func=sub)
    # set up inhibitory matrix
    inhib_matrix = [[-inhib_scale]] * neurons * dimensions
    output.addTermination('inhibition', inhib_matrix, tau_inhib, False)
    dlowpass.connect('abs_val.output', output.getTermination('inhibition'))

    return dlowpass.network

First off, this code is taking advantage of the absolute value function that was written a couple blog posts ago. You can either go check out that post or I’ll also have that function included in the code at the end for completeness. Aside from that, there are a lot of things going on that you won’t come across if you’re just coding up simple examples, so let’s look at them.

At the top, we’re assigning our output network a handle, which I try to avoid in general for neatness, since most of the times you can reference it by simply referring to it’s name, 'output'. The reason that I assign it a handle here is because we’re going to be calling upon some Java API features that (to my knowledge) aren’t handled in the Python API yet, and although we could call up the output node as a Java object with dlowpass.get('output') using only its assigned name, it will just be cleaner in the end to have a handle for it. We’ll come back to this.

The next interesting thing that happens, is that when we’re creating our derivative population, we set the number of neurons to 10*neurons, and radius=10. This is to because in the derivative we’re representing the sum of all of the input dimensions. How are all the input dimensions being summed up in derivative, you ask? Just below, on line 17. When we connect up the input relay to derivative we set index_post=0, which means that all of the input dimensions are going to to project to the same dimension of the derivative population. The default weight for this connection is 1, so then dimension 0 of derivative is equal to 1 * value for value in input_dimensions . Super.

But why do we set radius=10? This is because the radius parameter specifies the range of values represented by this population. The default is (-1,1), but when we specify radius, the new range of represented values becomes (-radius, radius). We’re making a bit of an assumption that this value won’t go outside of the range (-10,10) here, but that should be OK for most of the situations we’re going to come across. In the specific model I’m using this for it’s definitely the case, so that’s why I’ve set the default value to 10. And because we don’t want the accuracy in representation to decrease, we also scale up the number of neurons in this population by radius*neurons.

And there’s still more happening in this derivative population! On line 11 I specify a recurrent connection that projects into a second dimension represented in derivative. So now what’s going to happen is that the sum of the input signals is projected into the first dimension of derivative, and through a recurrent connection the value of the sum of the input signals from time t-pstc will be represented in the second dimension of the derivative population.

To calculate the derivative then, it’s a simple matter of subtracting the previous signal from the current signal, which is what happens in the function sub that I define on line 19. To implement this function, when connecting up derivative to abs_val, just set the parameter func=sub. Easy.

Now, when the abs_val population is made, we set and intercept=(.2,1). This is the same trick we used in the previous absolute value function model, but it’s acting as a threshold here. Basically, this population won’t respond if the value being projected into it is between (-.2, .2).

So, up to this point, what we have is a summation of the input dimensions, the derivative being calculated and passed to an absolute value function, and this population only responds if the derivative is greater than .2.

The last part is hooking up this abs_val population to the output relay, to suppress output whenever it’s activated (i.e. when the derivative is greater than .2). This is where we need to pull into the Java API, and this is why we specified a handle for our output population. In lines 25-26, what’s going on is that instead of using the NEF neural compiler functionality to set up our connection weights to compute some function, we’re specifying them ourselves. And we’re specifying them to prevent the neurons in output from firing. Now, the activation of the neurons in abs_val reduces the voltage values being sent into the output population, inhibiting their activity.

And that’s it! Here’s the complete code to run an example of this network (which can also be found on my github low_pass_derivative_filter.py):

import nef

# constants / parameter setup etc
N = 50 # number of neurons
D = 3 # number of dimensions

def make_abs_val(name, neurons, dimensions, intercept=[0]):
    def mult_neg_one(x):
        return x[0] * -1 

    abs_val = nef.Network(name)

    abs_val.make('input', neurons=1, dimensions=dimensions, mode='direct') # create input relay
    abs_val.make('output', neurons=1, dimensions=dimensions, mode='direct') # create output relay
    for d in range(dimensions): # create a positive and negative population for each dimension in the input signal
        abs_val.make('abs_pos%d'%d, neurons=neurons, dimensions=1, encoders=[[1]], intercept=intercept)
        abs_val.make('abs_neg%d'%d, neurons=neurons, dimensions=1, encoders=[[-1]], intercept=intercept)

        abs_val.connect('input', 'abs_pos%d'%d, index_pre=d)
        abs_val.connect('input', 'abs_neg%d'%d, index_pre=d)
        abs_val.connect('abs_pos%d'%d, 'output', index_post=d)
        abs_val.connect('abs_neg%d'%d, 'output', index_post=d, func=mult_neg_one)

    return abs_val.network

def make_dlowpass(name, neurons, dimensions, radius=10, tau_inhib=0.005, inhib_scale=10):

    dlowpass = nef.Network(name)

    dlowpass.make('input', neurons=1, dimensions=dimensions, mode='direct') # create input relay
    output = dlowpass.make('output', neurons=dimensions*neurons, dimensions=dimensions) # create output relay
    # now we track the derivative of sum, and only let output relay the input
    # if the derivative is below a given threshold
    dlowpass.make('derivative', neurons=radius*neurons, dimensions=2, radius=radius) # create population to calculate the derivative
    dlowpass.connect('derivative', 'derivative', index_pre=0, index_post=1, pstc=0.1) # set up recurrent connection
    dlowpass.add(make_abs_val(name='abs_val', neurons=neurons, dimensions=1, intercept=(.2,1))) # create a subnetwork to calculate the absolute value  

    # connect it up!
    dlowpass.connect('input', 'output') # set up communication channel
    dlowpass.connect('input', 'derivative', index_post=0)
    def sub(x):
        return [x[0] - x[1]]
    dlowpass.connect('derivative', 'abs_val.input', func=sub)
    # set up inhibitory matrix
    inhib_matrix = [[-inhib_scale]] * neurons * dimensions
    output.addTermination('inhibition', inhib_matrix, tau_inhib, False)
    dlowpass.connect('abs_val.output', output.getTermination('inhibition'))

    return dlowpass.network

# Create network
net = nef.Network('net')

# Create / add low pass derivative filter
net.add(make_dlowpass(name='dlowpass', neurons=N, dimensions=D))

# Make function input
net.make_input('input_function', values=[0]*D)

# Connect up function input to filter
net.connect('input_function', 'dlowpass.input')

# Add it all to Nengo

And here’s a picture of it running. What you can see is that anytime the input changes quickly, the system input drops to zero, but when the input is holding constant or is changing slowly the output is allowed to pass through. Great! Just what we wanted. net

Tagged , ,

The role of phasic dopamine in the basal ganglia

As I mentioned in my last post, I’m reading a series of papers that presents a model of the basal ganglia, written mainly by Peter Redgrave, Kevin Gurney, and John Reynolds. Of particular interest throughout these articles is a re-examination of the role of the short-term phasic dopamine (DA) signal from the substantia nigra pars compacta (SNc). A well-propagated view is that the phasic DA signal is a reward prediction-error signal, but Redgrave et al present a strong argument against this and suggest instead a role of an agency determination / novel movement identification mechanism. In this post I’m going to be presenting their argument for this, and how reinforcement and reward based learning in the basal ganglia could work at large. Again, throughout there will be comments and questions I put forth, I will make an effort for it to be clear when something is from me and when it’s from the papers.

Phasic DA signal as a reward-prediction error
The idea that the phasic DA signal serves as a reward-prediction error is born out of a series of experiments presented in [Schultz 1998]. The idea of reward-prediction error comes from instrumental (aka operant) conditioning, where rewards ‘reinforce’ behavior by strengthening associations between stimuli and behavioral responses. Formally, a reward-prediction error is defined as the difference between the reward predicted at a given point in time and the actual reward received. This goes way back to [Thorndike 1911] where Thorndike formally states the idea as the Law of Effect:

Any act which in a given situation produces satisfaction becomes associated with that situation so that when the situation recurs the act is more likely than before to recur also.

In Schultz’s experiments, the DA neurons of a monkey are recorded from as the monkey performs a number of various tasks, including “reaction time tasks, direct and delayed GO-NO GO tasks, spatial delayed response and alternation tasks, air puff and saline active avoidance tasks, operant and classically conditioned visual discrimination tasks, self-initiated movements, and unpredicted delivery of a reward in absence of any formal task.” This following image has been lifted from the results of [Schultz 1998]:

The explanation of these results is presented as follows. In the top figure, a reward (R) is unexpectedly delivered and the DA neurons activate. This is because there is a positive error in the predicted reward; no reward was expected, but there was one, BOOM, phasic DA signal. In the middle figure a conditioned stimulus (CS) has been associated with the reward, now the CS occurs unexpectedly, which means that a reward is on the way, thus once again there is a positive error in the predicted reward. Now, however, at the time of reward delivery the reward was predicted and the reward was received. There was no error in reward-prediction, so there is no phasic DA signal. And finally, in the bottom panel we see a CS cause activation of the phasic DA signal, but this time no reward is delivered. Now, exactly when the reward should be delivered and is not there is a negative reward-prediction error, and a corresponding decrease in tonic DA levels is observed.

Another interesting result from this experiment is that the phasic DA signal will push backwards along the chain of predictive events to the earliest predictive sensory stimulus signalling that a reward is coming. Taken all together, a pretty strong case for the reward-prediction error hypothesis is presented.

Problems with phasic DA as a reward-prediction error
There are, however, a number of problems have arisen under close examination of this hypothesis and through further experimental work, laid out by Redgrave et al. These are the main contentions:

  • DA neurons respond not only to rewarding stimuli, but also to non-rewarding sensory events salient only by virtue of their novelty or intensity [Schultz 1998], as well as conditioned stimuli not associated with a reward [Bromberg-Martin 2010].
  • The phasic DA response is remarkably stereotyped (occurring with ~100ms latency, and a duration of ~100ms), across species, sensory modalities, numerous experimental paradigms, and largely independent of perceptual complexity of eliciting event [Redgrave 2011]. This highly stereotyped DA response time is incongruent with the reward-prediction error hypothesis when considering that there can be a marked difference in the time taken to establish the reward value of different stimuli.
  • The latency of gaze-shifts is in the range of 150-250ms [Jay 1987], and the phasic DA response very reliably occurs around ~100ms [Schultz 1998], this means that the reward-prediction error must be calculated before the animal has foveated on the stimulus. Additionally, the source of visual information driving the DA neurons is largely, if not exclusively, the superior colliculus. Neurons in the superior colliculus are highly sensitive to the location of luminance changes, but largely nonresponsive to color and geometric configurations, meaning that the superior colliculus is not in a position to provide object identity (and reward) information to DA neurons.

To contend with these last two points, a number of experiments have shown that DA neurons have shown responses of differing magnitudes and probabilities to unpredicted complex visual stimuli. However, throughout all of the experiments conducted, the different visual stimuli were presented consistently at the same location, which is exactly the visual feature that superior colliculus is capable of detecting [Redgrave 2006]. Rather than discriminating between complex visual stimuli features, the location of the stimuli is instead being used to determine the reward value of the stimulus. Outside of the experimental paradigm, however, temporally unpredictable events are also spatially unpredictable, which makes it unlikely that determining reward value by spatial location in natural environments would be a useful mechanism.

Taken all together, a pretty strong case against the reward-prediction error hypothesis is presented.

An alternative implementation of instrumental conditioning
In the paper series, Redgrave et al propose that instrumental (again, also known as operant) conditioning arises as a function of two mechanisms in the brain: 1) A mechanism to determine whether or not an unpredicted sensory stimuli was caused by the system (agency), establishing a cause-effect relationship if one exists, and 2) a mechanism for reward to modulate the afferent input to the striatum. As mentioned in the last post, the basal ganglia is a proposed central selection device, choosing actions based on the saliency of their input.

Phasic DA for agency determination / novel movement identification
Instead of the reward-prediction error being determined by the phasic DA signal, it is proposed to function for a much more basic purpose: Identifying the cause of unpredicted sensory stimuli. This is a prerequisite to instrumental conditioning / any adaptive behavior. This proposal is based on identifying another function that would generate behavior very similar to a reward-prediction error, while also considering the precise and highly stereotyped natures of the response (~100ms latency, ~100ms duration), and the other information that is likely to be in the striatum at the point when the phasic DA signal arrives. According to [Redgrave 2006] and [Redgrave 2008], there are at least three additional signals in the striatum at the time of phasic DA release:

  1. Sensory: from branching projections of the superior colliculus, providing information on the stimulus that elicited the phasic DA response
  2. Contextual information (i.e. general sensory, metabolic, cognitive state, and physical location): from any number of cortical, limbic, and subcortical projections into the striatum, and
  3. Motor-copy: signals sent from cortical and subcortical sensorimotor structures to the brainstem and spinal cord provide efference copies of the outgoing motor command through branching projections that are relayed both directly and indirectly (through the thalamus) to the dorsal striatum.

Note that these signals would also be in the basal ganglia and likely used in the same way in the proposal that phasic DA is for reward-prediction error as well. However, the list of problems presented above suggests strongly that stimulus and reward-value identification do not operate through the short-latency phasic DA signal. Agency detection is the alternative proposal for a learning based function that requires highly precise timing information, and does not rely on unavailable information such as object identity and actual stimulus reward-value.

One of the main problems with identifying the cause of unexpected stimuli is sorting through the irrelevant information to arrive at the specific trigger. The idea for the phasic DA signal to overcome this computational problem is that it ‘tags’ the signals in the dorsal striatum, including the motor-copy, when an unexpected stimulus occurs, making those actions to be more likely to be chosen again in a similar contextual situation. The authors also note that this process would be aided by the short-latency nature of the phasic DA signal, such that behavior evoked by the stimuli doesn’t get included in the signal tagging, confounding the event-outcome identification.
Through noisy exploration trying to make the event recur, signals that are consistently present become reinforced further, weeding through those that aren’t required to elicit the unexpected stimulus. Eventually the signal which accurately predicts the stimulus is identified. If it is a movement, then it gets added to the ‘library’ of motor actions, increasing the animals repertoire of predictable action / outcomes; in this way the phasic DA signal acts to determine agency and identify new movements. If the signal is not a movement, the association with a reward is noted, stored, and life carries on.

Prediction of sensory stimuli
As mentioned above, DA neurons respond to novel sensory stimuli. Interestingly, the novelty response of DA neurons habituates rapidly when a sensory stimulus fails to associate with a reward. Although much is known about the variables that influence habituation in primitive or reduced preparations, relatively little is known about the mechanisms behind the habituation of un-reinforced sensory stimuli [Redgrave 2011]. It could be a default property of the early sensory networks when a stimulus is repeatedly applied in the absence of any reinforcement, or the result of an outside network modulating afferent projections to sensory systems.

However, when a stimulus is associated with reward, early sensory systems sensitize to its presentation. Additionally, the phasic DA response shifts back to occur at the time of the conditioned stimulus, rather than at the time of the reward. This response continues to push backwards to the first predictable event in a chain of events leading to a reward, seemingly in conflict with the sensitization of stimuli associated with a reward. Additionally, if the conditioned stimulus occurs and the reward does not follow, at the time of the expected reward there is a dip in the tonic DA level.
The mechanism responsible is a precisely timed inhibitory signal that acts to cancel out the phasic DA response evoked by predicted rewards [Schultz 1998]. As stimuli are recognized as predictors of future sensory events, this timed inhibitory signal prevents the activation of the DA neurons. In this way only the first, unpredicted, appearance of a CS in a chain of events evokes a phasic DA response. The goal of this response is to try to learn the cause of this stimulus, in the event that no predictor is learned, the predictor first in the chain will continue to evoke a phasic DA response.

The source of this precisely timed inhibitory signal has not been identified experimentally, but there are several candidates identified in [Redgrave 2011]: direct inhibitory inputs from within the basal ganglia (striatum or globus pallidus); indirect inhibitory inputs from the habenulu-rostro-medial tegmental system (hRMTg); or phasic afferent excitation of local inhibitory neurons with connections to nearby DA neurons.

A side note from me. The cerebellum is widely regarded as a supervised learning center for the brain (so widely I won’t even provide supporting references!). With its highly stereotyped repeated neural structure, and the insane amount of neurons it houses (accounting for 10% of the volume of the brain but holding over 50% of its neurons!), it is thought to provide this supervised learning functionality for a number of different neural systems. The prediction of sensory events given a conditioned stimulus or efference copy of a motor command is a very basic supervised learning problem. The hRMTg system has, in its wide list of afferent projections, connections with deep cerebellar neurons [Jhou 2011]. Although the cerebellum wouldn’t necessarily be required, it also has access to all the contextual, sensory, and motor copy information sent to the basal ganglia, and the connections to the hRMTg system suggest it to me as a favorite among the possibilities listed.

Response to noxious events
Another highly valuable feature for a system is to flag any actions which led to a noxious, such as a painful response, and prevent those actions from being executed again. It would be expected, then, that DA activity is suppressed whenever noxious stimiuli are encountered. Indeed this is the case [Redgrave 2006], where phasic suppression of DA activity lasts for the duration of the noxious event. The mechanism believed responsible for this effect are specialized, high-threshold nociceptors, which are sensory receptors that responds to potentially damaging stimuli with direct projections to the spinal cord and brain. In the same way that phasic DA release potentiates connection strengths, phasic DA suppression depresses the weighting of these connections, making them less likely to be chosen again in the future when a similar situation arises.
To be clear, this response is only expected from stimuli that are directly perceived by the nociceptors to be noxious, such a phasic DA suppression is not expected in the case that a stimuli is noxious but higher level processing is required to determine its reward-value.

Reward maximization;
So we’ve established a likely function for the phasic DA signal, the identification of agency. There’s more to instrumental conditioning, however. There also needs to be a means of reward maximization. The details are light on this part of the model, but are based on the observation of computational models that afferent sensory structures projecting into the basal ganglia could also demonstrate reward-based modulation. This is would give rise to the reward-based action selection bias that is the crux of formal reinforcement learning. This figure is lifted from [Redgrave 2011]:

In this figure the proposed system is shown operating in response to intrinsic (to the basal ganglia) reinforcement on the left, in (A). This case arises when unexpected stimuli bias the action selection process of the basal ganglia to attempt to discover the cause of this stimulus, causing a ‘repetition bias’. On the right of the figure, in (B), the system is shown responding to extrinsic reinforcement, where higher level cortical processing centers have determined that a stimulus was rewarding, and the strength of projections into the basal ganglia are weighted to make the responsible action more likely to be repeated.

As the authors admit, how the reward maximization on the afferent projections to the basal ganglia could occur is still very much unknown. Additionally, as previously mentioned, the mechanisms through which non-reward associated novel stimuli habituate and reward associated novel stimuli sensitize remain to be determined. But the reward maximization proposal is definitely of secondary concern in these papers, the main issue being the reconsideration of the function of the phasic DA signal, for which a case was very strongly presented.

Overview of proposed model;
To put this all together, the system model works as follows. An event occurs that causes activity in an early sensory processing system, which activates the DA neurons. The DA neurons cause a biasing of action selection towards the actions in the dorsal striatum at that moment (which are the actions just taken), which potentially caused the novel, or unpredicted, sensory stimulus. Some other system now says ‘hey that was a rewarding stimuli, don’t habituate to it early sensory systems!’, preventing habituation in the early sensory system. The stimulus then continues to drive the DA neurons, tagging the signals that are in the striatum at that time. As this happens, the signals in the striatum will vary through noise on during action selection, which helps exploration to try to pin down what causes this unexpected (and now defined to be) rewarding sensory stimulus. So far all the biasing of action choice is taking place inside the striatum. When the phasic DA release has pinned down the signals that elicit this sensory stimulus, there’s a transference of this signal to the cortex. In the cortex the reward-maximization system can now bias this action that was figured out in the basal ganglia such that it’s weighted more heavily outside the striatum. Once this is done, the inhibitory predictive system can now learn the association between this signal occurring and a reward following, and a precisely timed inhibitory spike can be generated and sent to the DA neurons to prevent a dopaminergic release.

The last part about the inhibitory predictive system kicking in after transference to the cortex wasn’t explicitly stated in any of the papers, but that’s my understanding of this model.

Questions / Comments
Here are some questions that have come up as I’ve been reading through these papers.

– As mentioned in the previous post, the basal ganglia is proposed to be the central selection device for the brain. This means that the different command systems vying for control are constantly projecting in saliency signals, which makes me wonder how does BG make decisions for the upcoming moment if bombarded by efferent copies of motor commands? I remember reading previously in articles with other models of the basal ganglia a functional actor/critic separation of the dorsal/ventral striatum. Would some separation of saliency and information signals help? Or could it have something to do with dual population coding, which the authors previously mentioned as a means of conveying saliency. Perhaps the information is transmitted and the saliency is chosen from the norm of the vector of firing rates inside the striatum? This second option seems likely to introduce some timing issues.

– There are several promising models which operate based on reward maximization in happening first inside the basal ganglia, then being transferred out to the cortex [Ashby 2007], would the above separation into actor/critic dorsal/ventral striatum help realize this? With novelty detection in the dorsal side, receiving projections from the DA neurons, and reward maximization on the ventral side? Then upon consolidation of a “good” set of movements or action plans transference to the cortex? I am interested to investigate this.

Lots to think about!


[Ashby 2007] – A neurobiological theory of automaticity in perceptual categorization
[Bromberg-Martin 2010] – Dopamine in Motivational Control: Rewarding, Aversive, and Alerting
[Jay 1987] – Sensorimotor integration in the primate superior colliculus. I. Motor convergence
[Jhou 2011] – The mesopontine rostromedial tegmental nucleus: a structure targeted by the lateral habenula that projects to the ventral tegmental area of Tsai and substantia nigra compacta
[Matsumoto 2009] – Two types of dopamine neuron distinctly convey positive and negative motivational signals
[Redgrave 2006] – The short-latency dopamine signal: a role in discovering novel actions?
[Redgrave 2008] – What is reinforced by phasic dopamine signals?
[Redgrave 2011] – Functional properties of the basal ganglia’s re-entrant loop architecture: selection and reinforcement
[Schultz 1998] – Predictive Reward Signal of Dopamine Neurons
[Thorndike 1911] – Animal intelligence; experimental studies

Redgrave P, Vautrelle N, & Reynolds JN (2011). Functional properties of the basal ganglia’s re-entrant loop architecture: selection and reinforcement. Neuroscience, 198, 138-51 PMID: 21821101

Tagged ,

The basal ganglia for action selection

Peter Redgrave, Kevin Gurney, and John Reynolds have a series of papers out where they detail a basal ganglia model, looking at its physiology and potential functional role in the brain. They address a number of different points in their papers, and I’m going to write up a couple of posts in hopes of making the model / material more accessible and furthering my own understanding. I’ll also be adding in my own thoughts and questions as I go along, but I’ll try to keep explicit when ideas are coming from papers and when they’re coming from me. In this post I’m going to look at the basal ganglia’s proposed role as an action selection center.

Basal ganglia as an action selection center
In complex systems like the brain, there are numerous processes and sub-systems operating in parallel. Things like feeding, predator avoidance, mating, etc are all going to be suggesting a specific course of action for the body to follow (hereafter these different sub-systems will be referred to as ‘command systems’, keeping with terminology from the paper series). The problem arises in that there is only one body, and letting all the command systems have at controlling the body all at once is a poor idea for generating effective / efficient behavior. What is needed is a method of relegating control of the motor system to a single command system, and preventing signals from other command systems from interfering. This can be done by having all command systems put forward an ‘urgency’ (or saliency) level, and then using a winner-take-all (WTA) function to choose one to be in control.

In [Redgrave 1999], a set of possible solutions from engineering are presented in three WTA system architecture types: subsumption, distributed, and central selection.

Subsumption: In the subsumption architecture, the command systems have a priority ordering. In the event of a conflict, systems higher up on the priority list can override those lower than them to interrupt and suppress or replace outgoing commands. Although this allows quick response to environmental contingencies (such as the appearance of a predator, with the ‘evade predators’ command system given top priority), the prioritization is built in to the system, and as more command systems are added it becomes difficult to determine a proper prioritization. Additionally, due to the ordering of systems being built-in, the subsumption architecture displays far less flexibility than biological nervous systems.

Distributed: The distributed architecture is a popular choice for winner-take-all implementations, where each option is connected to all the others with an inhibitory connection. As the saliency of a given option increases, it inhibits the other options, which in turn reduces the inhibition they project back, until only one option is uninhibited. Here, selection is considered an ’emergent’ property of the network. This architecture also supports adaptation, as the weighting of the connections between options can be tuned, giving rise to complex dominance dynamics. However, there is a costly implementation. First, every option must be connected to every other option (resulting in n(n – 1) connections), and the connection weights properly balanced to give the desired prioritization. Second, to integrate a new option into the system another 2n connections must be added, and they must be properly balanced with the already existing connection weights. Third, the more options that are added to this system, the longer it takes to choose between them, especially if several options present saliency values very close to one another (this last point was added by me, and is not stated in the papers).

Central selection: In the central selection architecture, all of the command systems send their saliency values to a central switching device, which chooses one of them as the winner. In this case, the complexity of system connectivity is significantly reduced, to only 2n connections total (one from and to each command system), and to add a new system only requires 2 connections be added. Additionally, the case of tuning the connection weights from each system becomes significantly easier, as the dynamics that determine the winner are now explicitly based on the weighting of the only connection from each command system in to the central switching device.

Unsurprisingly, the central selection architecture is proposed to best model the structure of the brain for selecting between command systems (although the authors suggest each command system may implement a distributed selection architecture internally), and the basal ganglia is proposed as the central switching device. Supporting this, a computational model of the basal ganglia was presented in [Gurney 2001], and implemented in spiking neurons in [Stewart 2010], based on biological structure that very efficiently perform winner-take-all functionality. Interestingly, its architecture is such that it effectively chooses a winner quickly regardless of the number of competitors and its performance does not suffer from competitors presenting very similar saliency values [Stewart 2010].

Central selection constraints: The use of a central selection architecture also imposes several constraints: 1) the saliency of each competitor must be measured in some ‘common currency’, and 2) the output of the central switching device (the basal ganglia) must be set up such that it can activate the winning command system, and disable the losing ones.

For the common currency between command systems, the authors propose the use of dual population encoding [Koechlin 1996], which I’ll go into in another post more in detail, but basically says two things can be extracted from the firing pattern of a population of neurons: the first is the information being represented, and the second is the saliency of this information, determined as the norm of firing rates of the neurons.

To address the second constraint, we’ll first need to look briefly at the structure of the basal ganglia.

Basic structure of the basal ganglia
This is a very low-res diagram of the neurobiological structure of the basal ganglia, taken from [Gurney 2001]:
The principle input components of the basal ganglia are the striatum and the subthalamic nuclean (STN). These structures receive projections from pretty much the entire cerebral cortex, including the motor, sensory, association, and limbic areas. The main output components of the basal ganglia are the internal segment of the globus pallidus (GPi), and the substantia nigra pars reticulata and lateralis (SNr). The output of the basal ganglia projects then through the thalamus and back to the cortex. Notably, projections routed through the thalamus go to both the same sites that originated the basal ganglia input, as well as others, forming both closed and open loop systems [Joel 1994].

Parallel functional loops: There are two particular points of interest of the basal ganglia structure relevant to this discussion. The first is that there is an intrinsic separation of information from different brain regions as it travels through the basal ganglia, such that the basal ganglia can be viewed as having a number of different processing tracts that operate in parallel: limbic, associative, sensory, and motor. This is the set of closed loops mentioned above. Here is an illustration, taken from [Redgrave 2011]:
All of these loops have a highly similar structure, suggesting that each performs the same function on different information [Voorn 2004].

Tonically inhibitory output: The second point of interest addresses our second constraint mentioned above, of requiring some mechanism for enabling / disabling the output from a chosen command system to take control of the body: The output from the basal ganglia to the thalamus is tonically inhibitory. There have been several possible functional roles proposed for this tonic inhibition, both in the closed and open loop projections. I’ll discuss the closed loop case below. In the open loop projections, there seems to be a clear potential for a ‘gating’ mechanism, where the output from the winning system is disinhibited in the thalamus and allowed to pass forward. Extrapolating from this, I’ve made a very, very, very simplified diagram illustrating how open loop gating using tonic inhibition could work:
Here, the association area has a bunch of different command systems, labelled 1 through K, which all have their own ideas about what the motor control system should be doing. They each send out a branching projection, with the saliency values used by the basal ganglia, and the information carried into the thalamus. They all project to a part of the thalamus which routes the information to the motor system, but due to tonic inhibition from the basal ganglia, no information is passed through. Once the basal ganglia chooses a winner from the K command systems, however, that winner’s channel in the thalamus is disinhibited, and it can send it’s directions out to the motor system for execution. In this way, the basal ganglia has the ability to enable / disable output from a command system.

After discussion with a couple of the guys in my lab, a couple of benefits of using tonic inhibition over selective excitation as the output of the basal ganglia have come up.
The first is that the use of inhibition is a much simpler implementation of a gateway. When using inhibition, the connections from the basal ganglia fire if no information should pass through, and stop firing when it should. In the case of activation, however, there is necessarily some sort of multiplication operation being performed such that the output from the gateway is GATEWAY_VALUE * INPUT_VALUE. In addition to being more complicated that inhibition of undesired options, it’s inclined towards performance errors.
This is the second point, in that with tonic inhibition the basal ganglia stomps everything out. So nothing is accidentally passed through a gateway if a INPUT_VALUE becomes highly active. With selective activation, it’s foreseeable that high levels of INPUT_VALUE could mimic the activation levels of GATEWAY_VALUE * INPUT_VALUE. In these ways tonic inhibition makes a gateway functionality more efficient and effective.

Alternatively, the open loop gating could also function as my supervisory, Chris Eliasmith, comments below: The routing signal from the basal ganglia is projected through the thalamus out to modulate the cortico-cortico connections from the associative area to the motor cortex. Modifying the example diagram above to operate this way, we get:

In this case I drew out the different connections for clarity. The saliency values are projected to the basal ganglia, and a winner is chosen. The modulatory values projected through the thalamus then connect to the corticocortical connections from the associative area to the motor area, and set such that the winner is allowed to project into the motor area and the others are prevented. The benefit of performing gating this way is that the required bandwidth for information passing through the thalamus is significantly reduced.

Closed loop projections: The information in this subsection is not discussed in the paper series. The natural question following the discussion of the potential role of the open loops and tonic inhibition in the thalamus as a gating mechanism is what could the role of the closed loops be? The basal ganglia has been shown to play a strong role in motor learning and sequence learning. In [Stewart 2008], a spiking neuron model of the basal ganglia was developed that demonstrates how the recurrent connections with the cortex can be used to control the evolving dynamics of a population of neurons. In the paper a simple set of rules for counting are developed. In experiments on monkeys involving sequence learning, monkeys perform a similar type of learning figuring out how to appropriately move their arms to get the reward. If the basal ganglia is damaged, the monkeys are no longer able to learn new sequences, but can still perform previously learned sequences [Turner 2005].

[Ashby 2007] propose that information such as motor sequences can be learned in the basal ganglia, where very fine-grained mechanisms for identifying the timing and causal relationship between action and effect exit, and once learned, it can then be transferred to the cortex for more automatic execution. This is thought to be what has happened when monkeys are able to execute previously learned sequences, but not able to learn new sequences.

The use of tonically inhibitory output here is still unclear, but one possibility is that inside each command system there is a distributed network, containing each of the possible ‘next step’s for that command system. Inside the basal ganglia, one of these next steps is chosen, and it’s selection amounts to disinhibiting recurrent connections back to itself, allowing its saliency to increase to a point that all the other options are fully inhibited and the dynamics evolve according to the chosen next step.

Hierarchical selection of action
Now with this whole system in place, it is proposed that this structure serves to implement a hierarchy of action selection [Redgrave 1999]. In this hierarchy, the decision on how to next move would start out at a very abstract level, as a competition between some basic command systems arguing about how hungry, tired, horny, etc you are. Once it’s decided that you are more hungry than the others, the next level of the hierarchy is engaged to decide what your best option is: go to the store to get food, eat your canned beans, or order a pizza. This then continues on until you get to a level of deciding what muscles to move, all based on your goal of eating a can of beans. This of course is a gross simplification of any possible analogous process in the brain, but it hopefully gets the point across.

One of the major benefits of a hierarchical action selection setup is that decision making is simplified on the lower levels, because a large number of options are not in line with the decisions made at a higher level. For example, to the end of getting your can of beans, you probably don’t have to decide to not punch yourself in the face, because it doesn’t further you along your path to getting beans.

Things of course become even more complicated when you consider that is possible to be working towards to goals at the same time, in that it is possible for us to successfully walk and chew gum at the same time. But looking at that falls outside of the scope of this post.

In this post I’ve put forth the case presented in the paper series from Redgrave et al for the basal ganglia as an action selection center. Without a doubt there is much more experimental work that needs to be examined, but here I’ve focused on providing a brief overview of how the basal ganglia could be implementing action selection. In future posts on the subject, I’ll be looking at other issues addressed by the Redgrave paper series, in particular the role of the short-latency phasic dopamine signal in the basal ganglia. My goal is to work through these papers and then present an incorporation of this work into a larger model of the motor control system.


[Ashby 2007] – A neurobiological theory of automaticity in perceptual categorization
[Gurney 2001] – A computational model of action selection in the basal ganglia. I. A new functional anatomy
[Joel 1994] – The organization of the basal ganglia-thalamocortical circuits: open interconnected rather than closed segregated
[Koechlin 1996] – Dual Population Coding in the Neocortex: A Model of Interaction between Representation and Attention in the Visual Cortex
[Redgrave 1999] – The Basal Ganglia: A Vertebrate Solution To The Selection Problem?
[Redgrave 2011] – Functional properties of the basal ganglia’s re-entrant loop architecture: selection and reinforcement
[Stewart 2008] – Building production systems with realistic spiking neurons
[Stewart 2010] – Dynamic Behaviour of a Spiking Model of Action Selection in the Basal Ganglia
[Turner 2005] – Sequential Motor Behavior and the Basal Ganglia: Evidence from a serial reaction time task in monkeys
[Voorn 2004] – Putting a spin on the dorsal–ventral divide of the striatum

Redgrave P, Prescott TJ, & Gurney K (1999). The basal ganglia: a vertebrate solution to the selection problem? Neuroscience, 89 (4), 1009-23 PMID: 10362291

Tagged ,