## Wrapping MapleSim C code for Python

Previously, I worked through a simple inverse kinematics example with at three link arm. The arm, however, was uninteresting, and in fact had no dynamics of its own. I’ve also previously (but not in a blog post) built up a number of arm models using MapleSim that were interesting, and ported them into C. From there I used them in Matlab / Simulink, but I’m on a Python kick now and tired of Matlab; so I’ve been looking at how to wrap this C code generated by MapleSim using Cython such that I can interact with it from Python.

So the code that simulates my models is exported to C, which is great in that it’s super fast, but it’s bad because it’s a pain in the ass to work with. I don’t want to have to write my controllers and learning algorithms all in C, it would be much nicer to do it all in Python. So I was looking at interfacing Python with the C library, and did a little (as seen in my last post), but then I already know how to make Cython play nice with C++ so I figured let’s shoot for that instead. What follows now is a hacky guide to getting your MapleSim model simulations to work with Python. First we’ll get the C code to just run, then we’ll port it to Python, then we’ll get it going graphically.

Get the MapleSim code to run in C++ with classes and input

Step 1: Rename your file from ‘whatever.c’ to ‘whatever.cpp’, in the code here I’m renaming ‘c2LinkArm.c’ to ’2LinkArm.cpp’. I know, I know, it’s painful but hopefully that’s the worst of it.

Step 2: Add #include "mplshlib.h" to the beginning of the file. The code is meant to be run in a MapleSim environment (or something), so it requires one of the Maple library files. We’ll just add this in here and everything should be happy.

Step 3: For whatever reason, there is a function in here that sets all of the system input to 0. Why, you ask? Good question. The function in question is static void inpfn(double T, double *U), and you can see that all it does is set every element of U = 0. Now, you can either comment out the body of this function, or, several lines down, the first line of the SolverUpdate function (after a long i variable declaration) is the call to this function, so you can comment that out and then everything’s fine. Hurray! We can give input to the system now!

And that’s all the inline code editing we have to do. We’re not done with this file though, we still need to append some things so that we can use it easily.

Step 4: The reason that I want to use C++ is because with C++ we can make a class, which we’ll call the Sim class, that can store all of our simulation details, and hold some wrapper functions to hide the full blown interface to the simulator functions. Let’s make that class now. Go down to the bottom of the file and paste in the following code:

/*  A class to contain all the information that needs to
be passed around between these functions, and can
encapsulate it and hide it from the Python interface.

Written by Travis DeWolf (May, 2013)
*/
class Sim {
/* Very simple class, just stores the variables we
need for simulation, and has 2 functions. Reset
resets the state of the simulation, and step steps it
forward. Tautology ftw!*/

double* params;
double dt, t0;
double u0[NINP], other_out[NOUT+1], y[NOUT];
double w[1 + 2 * NEQ + NPAR + NDFA + NEVT];

SolverStruct S;

public:
Sim(double dt_val, double* params_pointer);
void reset(double* out, double* ic);
void step(double* out, double* u);
};

Sim::Sim(double dt_val, double* params_pointer)
{
t0 = 0.0; // set up start time
dt = dt_val; // set time step
for (int i = 0; i < NINP; i++)
u0[i] = 0.0; // initial control signal

params = params_pointer; // set up parameters reference

/* Setup */
S.w = w;
S.err = 0;
}

void Sim::reset(double* out, double* ic)
{
SolverSetup(t0, ic, u0, params, y, dt, &S);

/* Output */
out[0] = t0;
for(int j = 0; j < NOUT; j++)
out[j + 1] = y[j];
}

void Sim::step(double* out, double* u)
/* u: control signal */
{
for (int k = 0; k < NOUT; k++)
out[k] = *dsn_undef; // clear values to nan

/* Integration loop */
/* Take a step with states */
EulerStep(u, &S);

if (S.err <= 0)
{
/* Output */
SolverOutputs(y, &S);

out[0] = S.w[0];
for(long j = 0; j < NOUT; j++)
out[j + 1] = y[j];
}
}

int main (void)
{
FILE *fd;

double *ic, *p, *out;
char* errbuf;
long i, j, outd;
long internal = 0;

double dt = 0.00001;

int time_steps = 1000000;
double u[NINP];
for (int k = 0; k < NINP; k++) u[k] = .1;

fd = fopen("output.dat", "w");

Sim sim = Sim(dt, NULL);
sim.reset(out, NULL); // ic = NULL, use default start state

for(i=0;i<time_steps;i++)
{
sim.step(out, u);
fprintf(fd,"%lf ",out[0]);
for(j=0;j<NOUT;j++)
{
fprintf(fd,"%lf ",out[j+1]);
}
fprintf(fd, "\n");
}

fclose(fd);

return 0;
}


So, this is based off of the ParamDriverC() function from the MapleSim generated code. I just separated out the initialization and the simulation into two functions (reset() and step(), respectively), so that it’s possible to explicitly control the simulation stepping through time, and change the input control signal at every time step.

NOTE: The dt here must be <= 1e-5, or the constraint solver in the code throws an error. Don’t fear though, this simulator is still crazy fast.

OK. This code then can be compiled (once you have the library files in the same folder) and run with:

g++ c2LinkArm.c -out sim
./sim


You may have noticed a bunch of warning messages scrolling by, I’m not about to touch those and they don’t affect the simulation, so as long as one saying ‘error’ doesn’t pop up let’s just leave well enough alone.

Once you’ve run the executable you should now have a file called ‘output.dat’ filled with simulation data. The first column is time, and the rest are the output variables from your MapleSim model. Alright, we’ve got our code up and running! Let’s go an and make a Cython wrapper for it.

Making a Cython wrapper for the MapleSim code
The first thing that we need to do is take the main function we just added in out of the simulation code above. We’re done with running the C code on its own so we don’t need it anymore. Get rid of it already.

Now, this is going to be very similar to the code from the previous Cython posts, but in this one there’s no getting around it: We need to pass back and forth arrays. It turns out this isn’t very complicated, but there’s a template you have to follow and it can be difficult if you don’t know what that is.

I’ll post the code and then we can go through the tricky parts. As always, we create a ‘pyx’ file for our Cython wrapper code, I called this one ‘py2LinkArm.pyx’.

import numpy as np
cimport numpy as np

cdef extern from "c2LinkArm.cpp":
cdef cppclass Sim:
Sim(double dt, double* params)
void reset(double* out, double* ic)
void step(double* out, double* u)

cdef class pySim:
cdef Sim* thisptr
def __cinit__(self, double dt = .00001,
np.ndarray[double, mode="c"] params=None):
"""
param float dt: simulation timestep, must be < 1e-5
param array params: MapleSim model internal parameters
"""
if params: self.thisptr = new Sim(dt, &params[0])
else: self.thisptr = new Sim(dt, NULL)

def __dealloc__(self):
del self.thisptr

def reset(self, np.ndarray[double, mode="c"] out,
np.ndarray[double, mode="c"] ic=None):
"""
Reset the state of the simulation.
param np.ndarray out: where to store the system output
NOTE: output is of form [time, output]
param np.ndarray ic: the initial conditions of the system
"""
if ic: self.thisptr.reset(&out[0], &ic[0])
else: self.thisptr.reset(&out[0], NULL)

def step(self, np.ndarray[double, mode="c"] out,
np.ndarray[double, mode="c"] u):
"""
Step the simulation forward one timestep.
param np.ndarray out: where to store the system output
NOTE: output is of form [time, output]
param np.ndarray u: the control signal
"""
self.thisptr.step(&out[0], &u[0])


This is pretty yamiliar: We make a reference to a class defined in C++, and then we wrap it in a python class and define functions that we can actually call from Python.

The main difference here is that we have to handle passing in numpy arrays, and having them be in the appropriate double* form that our C++ code is expecting to see. There are three parts involved in this. First, note at the top that we have an import numpy as np followed by a cimport numpy as np, this gives us access to the functions that we’ll need. Second, for each of the arrays accepted as parameter we have a special type declaration: np.ndarray[double, mode="c"] var. And finally, we pass the arrays on with a dereferencing &var[0]. If we were passing along 2D arrays we would use &var[0][0].

In the __init__() and reset() methods, I’m allowing the model parameters, params, and initial conditions, ic, arrays to be undefined, in which case NULL is passed to our C code and the default values we defined inside MapleSim are used. The reason for this is that these require some pretty intimate knowledge of the MapleSim model, and if a casual user wants to just use this business they shouldn’t have to know anything about the internal states.

With our wrapper code done now the only thing left is our setup file! This one is just straight from the previous posts, with the exception that only the ‘pyx’ file is included in the sources declaration because our ‘py2LinkArm.pyx’ file references the ‘c2LinkArm.cpp’ file directly, instead of referencing a header file.
setup.py

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

setup(
cmdclass = {'build_ext': build_ext},
language="c++"),],
)


With our setup file we should now be good to go. Let’s compile! To do this I pull open IPython and punch in

run setup.py build_ext -i


And it will complain and throw some warnings, but it should compile just fine. And now we have access to the pySim class!
To see it do some things, type in

import numpy as np

u = np.ones(2) # input array
out = np.zeros(3) # output array

sim.reset(out)
print out # initial values

for i in range(10):
sim.step(out, u)
print out


And we can see it moving forward! If you set u = np.ones(2) * .1, which is what it was when we generated the ‘output.dat’ file using the C code only and run it forward you’ll see that the we get the same results for both. Excellent. OK now let’s display it!

Running and plotting the MapleSim arm model
In a previous post I used Pyglet to plot out a 3 link arm (a very poor, lines only plot, but sufficient), and I’m going to use it again here (for a very poor, lines only plot). The file is called ‘run.py’, and is less than 80 lines. I’ll post it and then walk through it below:
run.py

import numpy as np
import pyglet
import time

def run():
"""Runs and plots a 2 link arm simulation generated by MapleSim."""

# set up input control signal to simulation
u = np.ones(2) * .1
# set up array to receive output from simulation
out = np.zeros(3)

# create MapleSim sim class
# set up initial conditions of the system
sim.reset(out)

###### Plotting things

# segment lengths = (.31, .27)m (defined in MapleSim)
L = np.array([31, 27]) * 3

# make our window for drawin'
window = pyglet.window.Window()

# set up some labels to display time
label_time = pyglet.text.Label('time', font_name='Times New Roman',
font_size=36, x=window.width//2, y=window.height - 50,
anchor_x='center', anchor_y='center')
# and joint angles
label_values = pyglet.text.Label('values', font_name='Times New Roman',
font_size=36, x=window.width//2, y=window.height - 100,
anchor_x='center', anchor_y='center')

def update_arm(dt):
""" Simulate the arm ahead a chunk of time, then find the
(x,y) coordinates of each joint, and update labels."""

# simulate arm forward 15ms
for i in range(1500):
sim.step(out, u)

# find new joint (x,y) coordinates, offset to center of screen-ish
x = np.array([ 0,
L[0]*np.cos(out[1]),
L[0]*np.cos(out[1]) + L[1]*np.cos(out[1]+out[2])]) + window.width/2
y = np.array([ 0,
L[0]*np.sin(out[1]),
L[0]*np.sin(out[1]) + L[1]*np.sin(out[1]+out[2])]) + 100

# update line endpoint positions for drawing
window.jps = np.array([x, y]).astype('int')
label_time.text = 'time: %.3f'%out[0]
label_values.text = '(01, 02) = (%.2f, %.2f)'%(out[1], out[2])

update_arm(0) # call once to set up

@window.event
def on_draw():
window.clear()
label_time.draw()
label_values.draw()
for i in range(2):
pyglet.graphics.draw(2, pyglet.gl.GL_LINES, ('v2i',
(window.jps[0][i], window.jps[1][i],
window.jps[0][i+1], window.jps[1][i+1])))

# set up update_arm function to be called super often
pyglet.clock.schedule_interval(update_arm, 0.00001)
pyglet.app.run()

run()


Alright. In the first part, we’re just doing the same as above when we imported our brand new module and set it up for simulation.
What’s different is that we’ve now pushed the actual simulation into a method called update_arm. The idea is that the drawing is significantly slower than the simulation, so we’ll simulate a bunch, then update the plots once, then repeat, and hopefully get something that simulates at a decent speed. Once we have the joint angles of the arm stored safely in our out variable, we calculate the (x,y) locations of the elbow and wrist and update the system. Then the last thing this method does is update the text of the labels in the window.

In the on_draw method we’ve overwritten, we clear the window, then draw our labels and the lines corresponding to upper and lower arm segments.

Then the last thing is to schedule our update_arm method to be called periodically, in this case in extremely often.

When you run this file, you should see something that looks like:

You’ll have to trust me it looks better when it’s moving, and when it’s moving it will look better when the input isn’t constant.

But there you go! Now you are able to take your very own MapleSim model’s optimized C code, append some C++ code to make a Sim class, wrap it in Cython, and access it as a Python module! Why would anyone ever want to go through this trouble? Well, finding good simulations with appropriate dynamics can be a huge pain in the ass, and sometimes you just want something stand-alone. Also, now that this is done, this exact same code should be good for any MapleSim model, so hopefully the process is relatively straightforward. At least that’s my dream.

The code that I’ve used in this post can be found at my github: 2LinkArm.

## Cython Journey Part 3: Wrapping C code and passing arrays

Basic C interfacing example

OK, so this code is taken straight from an example here, I’m just going to explicitly work through all the steps so that when I try to do this again later I don’t have to look everything up again.

Let’s say we have some c code that we want to work with from Python, in a file called ‘square.c’:

void square(double* array, int n) {
int i;
for (i=0; i<n; ++i) {
array[i] = array[i] * array[i];
}
}


The first thing we’re going to need to do is compile this to a shared object file, so we punch the following into our terminal:

gcc -c square.c # compile the code to an object file
gcc square.o -shared -o square.so # compile to shared object file


And so now we have our square.so file. Super.

Here is the code that we can then use to interact with this shared object file from Python:

import numpy as np
import ctypes

n = 5
a = np.arange(n, dtype=np.double)

aptr = a.ctypes.data_as(ctypes.POINTER(ctypes.c_double))
square.square(aptr, n)

print a


OK, so there are a couple of lines with a lot of jargon but aside from that this looks pretty straightforward. The first noticeable thing is that we’re importing the ctypes library. So, make sure that you have that installed before you try to run this code. Right after importing ctypes then we create a handle into our shared object file, using the ctypes.cdll.LoadLibrary() command.

Then it’s some standard python code to create an array, and then there’s this funky command that lets us get a handle to the array that we just created as a c pointer that we can pass in to our square() function. You can see a list of all the different kinds of data types here.

Once we have that handle to our array, we simply pass it through to the square.square() function, and let it go and do whatever it’s going to do with it. Then we print out our array a, and to our great joy, all of the values have been squared.

So that’s that. Instead of returning anything like double* from the C library I would recommend just passing in the array as a parameter and modifying it instead. This is all nice and straightforward enough, and it seems like the returning business can lead to a lot of memory allocation and ownership headaches.

The code for this example can be found at my github, here: Square.

Tagged , , , ,

## Inverse kinematics of 3-link arm with constrained minimization in Python

Inverse kinematics is a common topic in robotics control; one that most anyone working with a robotic arm needs to address at some point. The use case I’ll be looking at here is the situation where we want to specify an $(x,y)$ coordinate for the end-effector, and solve for the appropriate joint angle configuration that best achieves this. It’s surprisingly straightforward to handle with the right approach, so it’s a nice little exercise in Python optimization (and we’ll even work in some graphical display!) And per usual, the code is all up on my github.

Most of the time the servos that anyone is working with are position controlled, meaning that you provide them with a set of desired angles for them to move to, rather than force controlled, where you provide them with a torque signal that directly moves them. In this case, inverse kinematics is all you need. You simply work out the set of joint angles you want the servos to move to and send those out directly. In biological systems this clearly isn’t the end of the work, because even if you know where you want to go you still have to provide a force (muscle activation) signal that will move your system there. But! as mentioned in the motor babbling paper review post I wrote a while back, having the target specified in the most basic space (i.e. joint angles instead of $(x,y)$ coordinates for a servo based arm, or muscle lengths instead of joint angles for a muscle based arm) can make calculating, or learning, the appropriate force signal much easier; so something that can do the inverse kinematics calculation is worth having.

Inverse kinematics

Now then, how do we go about finding the joint angle configuration that best achieves a desired $(x,y)$ coordinate? Constrained minimization! Our constraint is that the tip of the arm must be at the specified $(x,y)$ position, and we need to specify some cost function for the system to minimize to choose among all the possible configurations that accomplish this. The cost function that we’re going to use here is a very intuitive one, that I first encountered in the paper ‘Learning operational space control’ by Jan Peters. What we’re going to do is specify a default, or resting, joint state configuration, and minimize our distance from that. Which makes a lot of sense.

I’m going to post the code for calculating the inverse kinematics of a 3-link arm, and then we’ll work through it.

import math
import numpy as np
import scipy.optimize

def __init__(self, q=None, q0=None, L=None):
"""Set up the basic parameters of the arm.
All lists are in order [shoulder, elbow, wrist].

:param list q: the initial joint angles of the arm
:param list q0: the default (resting state) joint configuration
:param list L: the arm segment lengths
"""
# initial joint angles
if q is None: q = [.3, .3, 0]
self.q = q
# some default arm positions
if q0 is None: q0 = np.array([math.pi/4, math.pi/4, math.pi/4])
self.q0 = q0
# arm segment lengths
if L is None: L = np.array([1, 1, 1])
self.L = L

self.max_angles = [math.pi, math.pi, math.pi/4]
self.min_angles = [0, 0, -math.pi/4]

def get_xy(self, q=None):
if q is None: q = self.q

x = self.L[0]*np.cos(q[0]) + \
self.L[1]*np.cos(q[0]+q[1]) + \
self.L[2]*np.cos(np.sum(q))

y = self.L[0]*np.sin(q[0]) + \
self.L[1]*np.sin(q[0]+q[1]) + \
self.L[2]*np.sin(np.sum(q))

return [x, y]

def inv_kin(self, xy):

def distance_to_default(q, *args):
# weights found with trial and error, get some wrist bend, but not much
weight = [1, 1, 1.3]
return np.sqrt(np.sum([(qi - q0i)**2 * wi
for qi,q0i,wi in zip(q, self.q0, weight)]))

def x_constraint(q, xy):
x = ( self.L[0]*np.cos(q[0]) + self.L[1]*np.cos(q[0]+q[1]) +
self.L[2]*np.cos(np.sum(q)) ) - xy[0]
return x

def y_constraint(q, xy):
y = ( self.L[0]*np.sin(q[0]) + self.L[1]*np.sin(q[0]+q[1]) +
self.L[2]*np.sin(np.sum(q)) ) - xy[1]
return y

return scipy.optimize.fmin_slsqp( func=distance_to_default,
x0=self.q, eqcons=[x_constraint, y_constraint],
args=[xy], iprint=0) # iprint=0 suppresses output



I’ve taken out most of the comments for compactness, but there are plenty in the code on github, don’t you worry. Alright, let’s take a look!

First, we go through and set up the parameters of our arm. If nothing was passed in during construction, then we’re going to make an arm that is initially at it’s resting position, with shoulder and elbow joint angles = $\frac{\pi}{4}$, and wrist angle = 0. We also set up the arm segment lengths to all be 1, and the minimum and maximum joint angles.

Next, there’s the get_xy function, which just uses some simple trig to calculate the current $(x,y)$ position of the end-effector. This is going to be a handy thing to have when we want to see how well the joint configurations we calculate achieve our desired hand position.

And then after this, we have the actual inverse kinematics function. inv_kin takes in a desired $(x,y)$ position, and returns to us a list of the joint angles that will best achieve this. We define three methods inside inv_kin that define our constrained optimization problem, distance_to_default, which is the function we are minimizing, and x_constraint and y_constraint, which are the constraints that must be met for a solution to be valid. These are all pretty straight forward functions, the distance_to_default function calculates the Euclidean distance in joint space to the resting state, and the constraint functions calculate the difference between the actual and desired end-effector positions for a given set of joint angles.

There is one point of interest in the distance_to_default method, and that is the use of the weights therein. What is the point of applying a gain to these values, you ask? Why, simply that it lets us weight the relative importance of each of these joints remaining close to their default configuration! If you think about moving your arm on a horizontal plane, the wrist actually bends very little. To mimic this we can set the weight applied to the distance of the wrist from its resting state higher than those of the other joints, so that it becomes a higher priority to keep the wrist near its resting state. I suggest playing around with once the graphical display (below) is built up, it’s fun to see the effects different weightings can give on joint configurations.

With these three functions defined, we can now call up the scipy.optimize.fmin_slsqp function, which performs the sequential least squares method to arrive at a solution.

Really, that’s all there is to it!

To test this now, we have the following method:

def test():
############Test it!##################

# set of desired (x,y) hand positions
x = np.arange(-.75, .75, .05)
y = np.arange(0, .75, .05)

# threshold for printing out information, to find trouble spots
thresh = .025

count = 0
total_error = 0
# test it across the range of specified x and y values
for xi in range(len(x)):
for yi in range(len(y)):
# test the inv_kin function on a range of different targets
xy = [x[xi], y[yi]]
# run the inv_kin function, get the optimal joint angles
q = arm.inv_kin(xy=xy)
# find the (x,y) position of the hand given these angles
actual_xy = arm.get_xy(q)
# calculate the root squared error
error = np.sqrt((np.array(xy) - np.array(actual_xy))**2)
# total the error
total_error += error

# if the error was high, print out more information
if np.sum(error) > thresh:
print '-------------------------'
print 'Initial joint angles', arm.q
print 'Final joint angles: ', q
print 'Desired hand position: ', xy
print 'Actual hand position: ', actual_xy
print 'Error: ', error
print '-------------------------'

count += 1

print '\n---------Results---------'
print 'Total number of trials: ', count
print 'Total error: ', total_error
print '-------------------------'


Which is simply working through a set of target $(x,y)$ points, and calculating the total error across all of them. If any particular point gives an error above the threshold level, print out the information for that point so we can take a closer look if so desired. Fortunately, the threshold error isn’t even approached in this test, and for output we get:

---------Results---------
Total number of trials:  450
Total error:  [  3.33831421e-05   2.89667496e-05]
-------------------------


Visualization

So that’s all great and dandy, but it’s always nice to be able to really visualize these things, so I wrote another method that uses Pyglet to help visualize. This is a really easy to use graphics program that I tested out as a means here, and I was very happy with it. I ended up writing a method that pops up a window with a 3-link arm drawn on it, and the arm uses the above inverse kinematics function in the Arm class written above to calculate the appropriate joint angles for the arm to be at to follow the mouse. Once the joint angles are calculated, then their $(x,y)$ locations are also calculated, and the arm is drawn. Pleasingly, the inv_kin above is fast enough to work in real time, so it’s a kind of fun toy example. Again, I’ll show the code and then we’ll work through it below.

import numpy as np
import pyglet
import time

import Arm

def plot():
"""A function for plotting an arm, and having it calculate the
inverse kinematics such that given the mouse (x, y) position it
finds the appropriate joint angles to reach that point."""

# create an instance of the arm
arm = Arm.Arm3Link(L = np.array([300,200,100]))

# make our window for drawin'
window = pyglet.window.Window()

label = pyglet.text.Label('Mouse (x,y)', font_name='Times New Roman',
font_size=36, x=window.width//2, y=window.height//2,
anchor_x='center', anchor_y='center')

def get_joint_positions():
"""This method finds the (x,y) coordinates of each joint"""

x = np.array([ 0,
arm.L[0]*np.cos(arm.q[0]),
arm.L[0]*np.cos(arm.q[0]) + arm.L[1]*np.cos(arm.q[0]+arm.q[1]),
arm.L[0]*np.cos(arm.q[0]) + arm.L[1]*np.cos(arm.q[0]+arm.q[1]) +
arm.L[2]*np.cos(np.sum(arm.q)) ]) + window.width/2

y = np.array([ 0,
arm.L[0]*np.sin(arm.q[0]),
arm.L[0]*np.sin(arm.q[0]) + arm.L[1]*np.sin(arm.q[0]+arm.q[1]),
arm.L[0]*np.sin(arm.q[0]) + arm.L[1]*np.sin(arm.q[0]+arm.q[1]) +
arm.L[2]*np.sin(np.sum(arm.q)) ])

return np.array([x, y]).astype('int')

window.jps = get_joint_positions()

@window.event
def on_draw():
window.clear()
label.draw()
for i in range(3):
pyglet.graphics.draw(2, pyglet.gl.GL_LINES, ('v2i',
(window.jps[0][i], window.jps[1][i],
window.jps[0][i+1], window.jps[1][i+1])))

@window.event
def on_mouse_motion(x, y, dx, dy):
# call the inverse kinematics function of the arm
# to find the joint angles optimal for pointing at
# this position of the mouse
label.text = '(x,y) = (%.3f, %.3f)'%(x,y)
arm.q = arm.inv_kin([x - window.width/2, y]) # get new arm angles
window.jps = get_joint_positions() # get new joint (x,y) positions

pyglet.app.run()

plot()


There are really only a few things that we’re doing in this method: We create an arm with somewhat normal segment lengths and a window for plotting; we have a function for calculating the $(x,y)$ coordinates of each of the joints for plotting; we have a function that tracks the position of the mouse and updates our arm joint angles by calling the arm.inv_kin function we wrote above; and lastly we plot it! Easy. The functions aren’t quite in that order, and that’s because I chose alphabetical ordering over conceptual. Here’s a picture!

I think the only kind of weird thing going on in here is that my variables that I use in both this method and in the overwritten window methods are defined as belonging to window. Aside from that, it’s straightforward trig to calculate the joint angle Cartesian coordinates, and then there’s also a label that displays the mouse $(x,y)$ position for the hell of it. Honestly, the label was in a Pyglet tutorial example and I just left it in.

It’s not going to win any awards for graphics by any means, but it’s simple and it works! And, assuming you have Pyglet installed (which is very easy to do), you should straight up be able to run it and be amazed. Or a little impressed. Or not, whatever. But in any case now you have your very own inverse kinematics example on a 3-link arm! Moving to any other number of joints is exactly the same, you just have to add the corresponding number of default joint configurations and everything should work out as is.

From here, the plan is to look at bringing in an actual force driven arm model to the mix, and then building up a system that can track the mouse, but with a control signal that specifies forces to apply to the arm, rather than positions. Which I think will be pretty neat. And in case you missed it at the top, here’s a link to all the code on my github.

## Distributed control of uncertain systems using superpositions of linear operators – Likelihood calculus paper series review part 3

The third (and final, at the moment) paper in the likelihood calculus series from Dr. Terrence Sanger is Distributed control of uncertain systems using superpositions of linear operators. Carrying the torch for the series right along, here Dr. Sanger continues investigating the development of an effective, general method of controlling systems operating under uncertainty. This is the paper that delivers on all the promises of building a controller out of a system described by the stochastic differential operators we’ve been learning about in the previous papers. In addition to describing the theory, there are examples of system simulation with code provided! Which is a wonderful, and sadly uncommon, thing in academic papers, so I’m excited. We’ll go through a comparison of Bayes’ rule and Markov processes (described by our stochastic differential equations), go quickly over the stochastic differential operator description, and then dive into the control of systems. The examples and code run-through I’m going to have to save for another post, though, just to keep the size of this post reasonable.

The form our system model equation will take is the very general

$dx = f(x)dt + \sum g_i(x)u_i dt + \sum h_i(x, u_i)dB_i$,

where $f(x)$ represents the environment dynamics, previously also referred to as the unforced or passive dynamics of the system, $g_i(x)$ describes how the control signal $u_i$ affects the system state $x$, $h_i(x, u_i)$ describes the state and control dependent additive noise, and $dB_i$ is a set of independent noise processes such as Brownian motion. Our control goal is to find a set of functions of time, $u_i(t)$, such that the dynamics of our system behave according to a desired set of dynamics that we have specified.

In prevalent methods in control theory, uncertainty is a difficult problem, and often can only be effectively handled with a number of simplifications, such as linear systems of Gaussian added noise. In biological systems that we want to model, however, uncertainty is ubiquitous. There is noise on outgoing and incoming signals, there are unobserved controllers simultaneously exerting influence over the body, complicated and often unmodeled highly non-linear dynamics of the system and its interactions with the environment, etc. In the brain especially, the effect of unobserved controllers is a particular problem. Multiple areas of the brain will simultaneously be sending out control signals to the body, and the results of these signals tends to be only partially known or known only after a delay to the other areas. So for modeling, we need an effective means of controlling distributed systems operating under uncertainty. And that’s exactly what this paper presents: ‘a mathematical framework that allows modeling of the effect of actions deep within a system on the stochastic behaviour of the global system.’ Importantly, the stochastic differential operators that Dr. Sanger uses to do this are linear operators, which opens up a whole world of linear methods to us.

Bayes’ rule and Markov processes

Bayesian estimation is often used in sensory processing to model the effects of state uncertainty, combining prior knowledge of state with a measurement update. Because we’re dealing with modeling various types of system uncertainty, it seems like a good idea to consider Bayes’ rule. Here, Dr. Sanger shows that Bayes’ rule is in fact insufficient in this application, and Markov processes must be used. There are a number of neat insights that come from this comparison, so it’s worth going through.

Let’s start by writing Bayes’ rule and Markov processes using matrix equations. Bayes’ rule is the familiar equation

$p(x|y) = \frac{p(y|x)}{p(y)}p(x)$,

where $p(x)$ represents our probability density or distribution, so $p(x) \geq 0$ and $\sum_i p(x = i) = 1$. This equation maps a prior density $p(x)$ to the posterior density $p(x|y)$. Essentially, this tells us the effect a given measurement of $y$ has on the probability of being in state $x$. To write this in matrix notation, we assume that $x$ takes on a finite number of states, so $p(x)$ is a vector, which gives

$p_x' = Ap_x$,

where $p_x$ and $p_x'$ are the prior and posterior distributions, respectively, and $A$ in a diagonal matrix with elements $A_{ii} = \frac{p(y|x)}{p(y)}$.

Now, the matrix equation for a discrete-time, finite-state Markov process is written

$p_x(t+1) = Mp_x(t)$.

So where in Bayes’ rule the matrix (our linear operator) $A$ maps the prior distribution into the posterior distribution, in Markov processes the linear operator $M$ maps the probability density of the state at time $t$ into the density at time $t+1$. The differences come in through the form of the linear operators. The major difference being that $A$ is a diagonal matrix, while there is no such restriction for $M$. The implication of this is that in Bayes’ rule the update of a given state $x$ depends only on the prior likelihood of being in that state, whereas in Markov processes the likelihood of a given state at time $t+1$ can depend on the probability of being in other states at time $t$. The off diagonal elements of $M$ allow us to represent the probability of state transition, which is critical for capturing the behavior of dynamical systems. This is the reason why Bayes’ method is insufficient for our current application.

Stochastic differential operators

This derivation of stochastic differential operators is by now relatively familiar grounds, so I’ll be quick here. Starting with a stochastic differential equation

$dx = f(x)dt + g(x)dB$,

where $dB$ is our noise source, the differential of unit variance Brownian motion. The noise term introduces randomness into the state variable $x$, so we describe $x$ with a probability density $p(x)$ that evolves through time. This change of the probability density through time is captured by the Fokker-Planck partial differential equation

$\frac{\partial}{\partial t}p(x,t) = - \frac{\partial}{\partial x}(f(x)p(x,t)) + \frac{1}{2} \frac{\partial^2}{\partial x^2}(g(x)g^T(x)p(x,t))$,

which can be rewritten as

$\dot{p} = \mathcal{L}p$,

where $\mathcal{L}$ is the linear operator

$\mathcal{L} = - \frac{\partial}{\partial x}f(x) + \frac{1}{2} \frac{\partial^2}{\partial x^2}g(x)g^T(x)$.

$\mathcal{L}$ is referred to as our stochastic differential operator. Because the Fokker-Planck equation is proven to preserve probability densities (non-negativity and sum to 1), applying $\mathcal{L}$ to update our density will maintain its validity.

What is so exciting about these operators is that even though they describe nonlinear systems, they themselves are linear operators. What this means is that if we have two independent components of a system that affect it’s dynamics, described by $\mathcal{L}_1$ and $\mathcal{L}_2$, we can determine their combined effects on the overall system dynamics through a simple summation, i.e. $\dot{p} = (\mathcal{L}_1 + \mathcal{L}_2)p$.

Controlling with stochastic differential operators

Last time, we saw that control can be introduced by attaching a weighting term to the superposition of controller dynamics, giving

$\dot{p} = \sum_i u_i \mathcal{L}_i p$,

where $\mathcal{L}_i$ is the stochastic differential operator of controller $i$, and $u_i$ is the input control signal to that controller. In the context of a neural system, this equation describes a set of subsystems whose weighted dynamics give rise to the overall behavior of the system. By introducing our control signals $u_i$, we’ve made the dynamics of the overall system flexible. As mentioned in the previous review post, our control goal is to drive the system to behave according to a desired set of dynamics. Formally, we want to specify $u_i$ such that the actual system dynamics, $\hat{\mathcal{L}}$, match some desired set of dynamics, $\mathcal{L}^*$. In equation form, we want $u_i$ such that

$\mathcal{L}^* \approx \hat{\mathcal{L}} = \sum_i u_i \mathcal{L}_i$.

It’s worth noting also here that the resulting $\hat{\mathcal{L}}$ still preserves the validity of densities that it is applied to.

How well can the system approximate a set of desired dynamics?

In this next section of the paper, Dr. Sanger talks about reworking the stochastic operators of a system into an orthogonal set, which can then be used to easily approximate a desired set of dynamics. It’s my guess that the motivation behind doing this is to see how close the given system is able to come to reproducing the desired dynamics. This is my guess because this exercise doesn’t really generate control information that can be used to directly control the system, unless we translate the weights calculated by doing this back into term of the actual set of actions that we have available. But it can help you to understand what your system is capable of.

To do this, we’ll use Gram-Schmidt orthogonalization, which I describe in a recent post. To actually follow this orthogonalization process we’ll need to define an inner product and normalization operator appropriate for our task. A suitable inner product will be one that lets us compare the similarity between two of our operators, $L_1$ and $L_2$, in terms of their effects on an initial state probability density, $p_0$. So define

$\langle L_1, L_2 \rangle_{p_0} = \langle L_1 p_0, L_2 p_0 \rangle = \langle \dot{p}_{L_1} , \dot{p}_{L_2} \rangle$

for the discrete-space case, and similarly

$\langle \mathcal{L}_1, \mathcal{L}_2 \rangle_{p_0} = \int (\mathcal{L}_1 p_0)(\mathcal{L}_2 p_0)dx = \int \dot{p}_{L_1} \dot{p}_{L_2} dx$

in the continuous-state space. So this inner product calculates the change in probability density resulting from applying these two operators to this initial condition, and finds the amount which they move the system in the same direction as the measure of similarity.
The induced norm that we’ll use is the 2-norm,

$||L||_{p_0} = \frac{||L p_0||_2}{||p_0||_2}.$

With the above inner product and normalization operators, we can now take our initial state, $p_0$, and create a new orthogonal set of stochastic differential operators that span the same space as original set through the Gram-Schmidt orthogonalization method. Let’s denote the orthonormal basis set vectors as $\Lambda_i$. Now, to approximate a desired operator $L^*$, generate a set of weights, $\alpha$, over our orthonormal basis set using a standard inner product: $\alpha_i = \langle L^*, \Lambda_i \rangle$. Once we have the $\alpha_i$, the desired operator can be recreated (as best as possible given this basis set),

$L^* \approx \hat{L} = \sum \alpha_i \Lambda_i.$

This could then be used as a comparison measure as the best approximation to a desired set of dynamics that a given system can achieve with its set of operators.

Calculating a control signal using feedback

Up to now, there’s been a lot of dancing around the control signal, including it in equations and discussing the types of control a neural system could plausibly implement. Here, finally, we actually get into how to go about generating this control signal. Let’s start with a basic case where we have the system

$\dot{p} = (\mathcal{L}_1 + u\mathcal{L}_2)p,$

where $\mathcal{L}_1$ describes the unforced/passive dynamics of the system, $\mathcal{L}_2$ describes the control-dependent dynamics, and $u$ is our control signal.

Define a cost function $V(x)$ that we wish to minimize. The expected value of this cost function at a given point in time is

$E[V] = \int V(x)p(x)dx,$

which can be read as the cost of each state weighted by the current likelihood of being in that state.
To reduce the cost over time, we want the derivative of our expected value with respect to time to decrease. Written in an equation, we want

$\frac{d}{dt}E[V] = \int V(x)\dot{p}(x)dx < 0.$

Note that we can sub in for $\dot{p}$ to give

$\frac{d}{dt}E[V] = \int V(x)[\mathcal{L}_1p(x) + u\mathcal{L}_2p(x)]dx.$

Since our control is effected through $u$, at a given point in time where we have a fixed and known $p(x,t)$, we can calculate the effect of our control signal on the change in expected value over time, $\frac{d}{dt}E[V]$, by taking the partial differential with respect to $u$. This gives

$\frac{\partial}{\partial u}\left[\frac{d}{dt}E[V]\right] = \int V(x)\mathcal{L}_2p(x)dx,$

which is intuitively read: The effect that the control signal has on the instantaneous change in expected value over time is equal to the change in probability of each state $x$ weighted by the cost of that state. To reduce $\frac{d}{dt}E[V]$, all we need to know now is the sign of the right-hand side of this equation, which tells us if we should increase or decrease $u$. Neat!

Although we only need to know the sign, it’s nice to include slope information that gives some idea of how far away the minimum might be. At this point, we can simply calculate our control signal in a gradient descent fashion, by setting $u = - k \int V(x)\mathcal{L}_2p(x)dx$. The standard gradient descent interpretation of this is that we’re calculating the effect that $u$ has on our function $\frac{d}{dt}E[V]$, and assuming that for some small range, $k$, our function is approximately linear. So we can follow the negative of the function’s slope at that point to find a new point on that function that evaluates to a smaller value.

This similarly extends to multiple controllers, where if there is a system of a large number of controllers, described by

$\dot{p} = \sum_i u_i \mathcal{L}_i p,$

then we can set

$u_i = - k_i \int V(x)\mathcal{L}_ip(x)dx.$

Dr. Sanger notes that, as mentioned before, neural systems will often not permit negative control signals, so where $u_i < 0$ we set $u_i = 0$. The value of $u_i$ for a given controller is proportional to the ability of that controller to be reduce the expected cost at that point in time. If all of the $u_i = 0$, then it is not possible for the available controllers to reduce the expected cost of the system.

Comparing classical control and stochastic operator control

Now that we finally have a means of generating a control signal with our stochastic differential operators, let’s compare the structure of our stochastic operator control with classical control. Here’s a picture:

The most obvious differences are that a cost function $V(x)$ has replaced the desired trajectory $\dot{x}$ and the state $x$ has been replaced by a probability density over the possible states, $p(x)$. Additionally, the feedback in classical control is used to find the difference between the desired and actual state, which is then multiplied by a gain, to generate a corrective signal, i.e. $u = k * (x^* - x)$, whereas in stochastic operator control signal is calculated as specified above, by following the gradient of the expected value of the cost function, i.e. $u = -k \int V(x)p(x)dx$.

Right away there is a crazy difference between our two control systems already. In classical control case, we’re following a desired trajectory, several things are implied. First, we’ve somehow come up with a desired trajectory. Second, we’re necessarily assuming that regardless of what is actually going on in the system, this is the trajectory that we want to follow. This means that the system is not robust to changes in the dynamics, such as outside forces or perturbations applied during movement. In the stochastic operator control case, we’re not following a desired path, instead the system is looking to minimize the cost function at every point in time. This means that it doesn’t matter where we start from or where we’re going, the stochastic operator controller looks at the effect that each controller will have on the expected value of the cost function and generates a control signal accordingly. If the system is thrown off course to the target, it will recover by itself, making it far more robust than classical control theory. Additionally, we can easily change the cost function input to the system and see a change in the behaviour of the system, whereas in classical control a change in the cost function requires that we regenerate our desired trajectory $\dot{x}$ before our controller will act appropriately.

While these are impressive points, it should also be pointed out that stochastic operator controllers are not the first to attack these issues. The robustness and behaviour business is similarly handled very well, for specific system (either linear or affine, meaning linear in terms of the dynamics of the control signal applied) and cost function forms (usually quadratic), by optimal feedback controllers. Optimal feedback controllers regenerate the desired trajectory online, based on system feedback. This leads to a far more robust control system that classical control provides. However, as mentioned, this is only for specific system and cost function forms. In the stochastic operator control any type of cost function be applied, and the controller dynamics described by $L_i$ can be linear or nonlinear. This is a very important difference, making stochastic operator control far more powerful.

Additionally, stochastic operator controllers operate under uncertainty, by employing a probability density to generate a control signal. All in all, stochastic operator controllers provide an impressive and novel amount of flexibility in control.

Conclusions

Here, Dr. Sanger has taken stochastic differential operators, looked at their relation to Bayes’ rule, and developed a method for controlling uncertain systems robustly. This is done through the observation that these operators have linear properties that lets the effects of distributed controllers on a stochastic system be described through a simple superposition of terms. By introducing a cost function, the effect of each controller on the expected cost of the system at each point in time can be calculated, which can then be used to generate a control signal that robustly minimizes the cost of the system through time.

Stochastic differential operators can often be inferred from the problem description; their generation is something that I’ll examine more closely in the next post going through examples. Using these operators time-varying expected costs associated with state-dependent cost functions can be calculated. Stochastic operator controllers introduce a significant amount more freedom in choice of cost function than has previously been available in control. Dr. Sanger notes that an important area for future research will be in the development of methods for optimal control of systems described by stochastic differential operators.

The downside of the stochastic operator controllers is that they are very computationally intensive, due to the fact that they must propagate a joint-density function forward in time at each timestep, rather than a single system state. One of the areas Dr. Sanger notes of particular importance for future work is the development of parallel algorithms, both for increasing simulation speed and examining possible neural implementations of such an algorithms.

And finally, stochastic differential operators exact a paradigm shift from classical control on the way control is considered. Instead of driving the system to a certain target state, the goal is to have the system behave according to a desired set of dynamics. The effect of a controller is then the difference between the behavior of the system with and without the activity of the controller.

This paper was particularly exciting because it discussed the calculation of the control signals for systems which we’ve described through the stochastic differential operators that have been developed through the last several papers. I admit confusion regarding the aside about developing an orthogonal equivalent set of operators, it seemed a bit of a red herring in the middle of the paper. I left out the example and code discussion from this post because it’s already very long, but I’m looking forward to working through them. Also worth pointing out is that I’ve been playing fast and loose moving back and forth between continuous and discrete, just in the interest in simplifying for understanding, but Dr. Sanger explicitly handles each case.

I’m excited to explore the potential applications and implementations of this technique in neural systems, especially in models of areas of the brain that perform a ‘look-ahead’ type function. The example that comes to mind is that of the rat reaching an intersection in a T-maze, and the neural activity recorded from place cells in the hippocampus shows the rat simulating the result of going left of going right. This seems a particularly apt application of these stochastic differential operators, as a sequence of actions and the resulting state can then be simulated and evaluated, providing that you have an accurate representation of the system dynamics in your stochastic operators.

To that end, I’m also very interested by possible means of learning stochastic differential operators for an action set. Internal models are an integral parts of motor control system models, and this seems like a potentially plausible analogue. Additionally, for modeling biological systems, the complexity of dynamics is something that is often infeasible to determine analytically. All in all, I think this is a really exciting road for exploring the neural control of movement, and I’m looking forward to seeing where it leads.

Sanger, T. (2011). Distributed Control of Uncertain Systems Using Superpositions of Linear Operators Neural Computation, 23 (8), 1911-1934 DOI: 10.1162/NECO_a_00151

## Gram-Schmidt orthogonalization

The context here is that we have some desired vector $v^*$ that we want to build out of a set of basis vectors $v_i$ through weighted summation. The case where this is easiest is when all of our vectors $v_i$ are orthogonal with respect to each other. Recalling that a dot product of two vectors gives us a measure of their similarity, two vectors are orthogonal if their dot product is 0. A basic example of this is the set $[1,0],[0,1]$, or the rotation of these vectors 45 degrees, $[.7071, .7071],[-.7071, .7071]$.

If we have an orthogonal basis set of vectors, then to generate the weights for each of the basis vectors we simply take the dot product between each $v_i$ and our desired vector $v^*$. For example, with our basis sets from above, the weights to generate the vector $[.45 -.8]$ can be found as

$w_1 = \langle [.45, -.8] , [1, 0] \rangle = .45 \\ w_2 = [.45, -.8] \cdot [0, 1] = -.8,$

where $\langle \rangle$ denotes dot (or inner) product, and leads to

$w_1 = \langle [.45, -.8] , [.7071, .7071] \rangle = -0.2475 \\ w_2 = \langle [.45, -.8] , [-.7071, .7071] \rangle = -0.8839.$

And now we have weights $w_i$ such that for each basis set $\sum_i w_i v_i = v^*$. Written generally, to find the weights we have $w_i = \frac{v_i \cdot v^*}{||v_i||}$. The denominator here is the norm of $v_i$, introduced for generality. In the example set our basis sets were composed of unit vectors (vectors with magnitude = 1), but in general normalization is required.

Now, what if we don’t have an orthogonal basis set? Trouble, that’s what. With a non-orthogonal basis set, such as $[1, .4], [-.1, 1]$, when we try our dot product business to find our coefficients looks what happens

$w_1 = \frac{\langle [.45, -.8] , [1, .4] \rangle}{||[1,.4]||} = .3682 \\ w_2 = \frac{\langle [.45, -.8] , [-.1, 1] \rangle}{||[-.1, 1]||} = -.8408,$

and

$.1207 \cdot [1,.4] + -.8408 \cdot [-.1, 1] = [0.2048, -0.7925]$,

which is not a good reconstruction of our desired vector, $[.45, -.8]$. And the more the cross contribution to the same dimensions between different basis vectors, the worse this becomes. Of course, we could use a least squares method to find our basis set coefficients, but that involves matrix multiplications and inverses, and generally becomes more complex than we want.

So, let’s say we have a basis set of two different, but non-orthogonal vectors, $v_1$ and $v_2$. We instead want two vectors $u_1$ and $u_2$ which describe the same space, but are orthogonal. By describing the same space, I mean that their span is the same. And by span I mean the set of values that can be generated through weighted summation of the two vectors. So we set $u_1 = v_1$, and the task is now to find the appropriate $u_2$. As a conceptual description, we want $u_2$ to be equal to $v_2$, but only covering the area of space that $u_1$ isn’t covering already. To do this, we can calculate at the overlap between $u_1$ and $v_2$, then subtract out that area from $v_2$. The result should then give us the same area of state space covered by $v_1$ and $v_2$, but in a set of orthogonal vectors $u_1$ and $u_2$.

Mathematically, we calculate the overlap between $u_1$ and $v_2$ with a dot product, $\langle u_1, v_2 \rangle$, normalized by the magnitude of $u_1$, $||u_1||$, and then subtract from $v_2$. All together we have

$u_2 = v_2 - \frac{\langle u_1, v_2 \rangle}{||u_1||}$.

Using our non-orthogonal example above,

$u_1 = [1, .4]$

$u_2 = [-.1, 1] - \frac{\langle [-.1, 1] , [1, .4] \rangle}{||[-.1, 1]||} = [-0.3785, 0.7215].$

Due to roundoff during the calculation of $u_2$, the dot product between $u_1$ and $u_2$ isn’t exactly zero, but it’s close enough for our purposes.

OK, great. But what about if we have 3 vectors in our basis set and want 3 orthogonal vectors (assuming we’ve moved to a 3-dimensional space) that span the same space? In this case, how do we calculate $u_3$? Well, carrying on with our intuitive description, you might assume that the calculation would be the same as for $u_2$, but now you must subtract out from $v_3$ that is covered by $u_1$ and $u_2$. And you would be correct:

$u_3 = v_3 - \frac{\langle u_1, v_3 \rangle}{||u_1||} - \frac{\langle u_2, v_3 \rangle}{||u_2||}$.

In general, we have

$u_i = v_i - \sum_j \frac{\langle u_j, v_i \rangle}{||u_j||}$.

And now we know how to take a given set of basis vectors and generate a set of orthogonal vectors that span the same space, which we can then use to easily calculate the weights over our new basis set of $u_i$ vectors to generate the desired vector, $v^*$.

## Neuro-mechanical control using differential stochastic operators – Likelihood calculus paper series review part 2

The second paper put out by Dr. Terence Sanger in the likelihood calculus paper series is Neuro-mechanical control using differential stochastic operators. Building on the probabalistic representation of systems through differential stochastic operators presented in the last paper (Controlling variability, which I review here) Dr. Sanger starts exploring how one could effect control over a system whose dynamics are described in terms of these operators. Here, he specifically looks at driving a population of neurons described by differential stochastic operators to generate the desired system dynamics. Neural control of a system requires that several phenomena outside the realm of classical control theory be addressed, including the effects of variability in control due to stochastic firing, large partially unlabeled cooperative controllers, bandlimited control due to finite neural resources, and variation in the number of available neurons.

The function of a neuron in a control system can be completely described in terms of 1) the probability of it spiking due to the current system state, $p(s=1|x)$, and 2) the effect of its response on the change in the state. Due to the inherent uncertainty in these systems, each individual neuron’s effect on the change in state is captured by a distribution, $p(\dot{x}|s)$. And because the effect of each neuron is only a small part of a large dynamical system that includes the dynamics of the system being controlled and the effects of all the other neurons, these distributions tend to be very broad.

Rephrasing the above description, neurons are mapping a given state $x$ to a change in state $\dot{x}$. Instead of using two conditional densities to describe this mapping, $p(s|x)$ and $p(\dot{x}|s)$, we can rewrite this more compactly as

$p(\dot{x}|x) = p(\dot{x}|s=1)\;p(s=1|x) + p(\dot{x}|s=0)\;p(s=0|x)$,

which can be read as the probability of a change in state $\dot{x}$ given the current state $x$ is equal to the probability of that change in state occurring if the neuron spikes, $p(\dot{x}|s=1)$, multiplied by the probability of that neuron spiking given the current state, $p(s=1|x)$, plus the probability of that state occurring if the neuron doesn’t spike, $p(\dot{x}|s=0)$, multiplied by the probability of that neuron not spiking, $p(s=0|x)$.

Differential stochastic operators

We want to capture the mapping $p(\dot{x}|x)$ in such a way that if we have a description of a bunch of controllers (such as neurons) and the change in system state they effect individually, we can combine them in a straightforward way to get the overall change in state resulting from all the controllers operating in parallel. To do this we can use the linear operators developed in the previous paper, which allows us combine the effects of multiple components through simple summation to determine the overall change in system state. We’ll go over it again here, as I find reading through several different versions of an idea very helpful for solidifying understanding.

Let $\mathcal{L}$ denote a class of linear operators that act on time-varying probability densities, $p(x,t)$, such that $\frac{\partial}{\partial t}p(x,t) = \mathcal{L}p(x,t)$. Because these operators need to preserve the properties of valid probability density (specifically that $\int p(x)dx = 1$ and $p(x) \geq 0$), for a given operator $L$ where $\dot{p}(x) = \int L(x,y) p(y) dy$ we require that:

• 1) $\int L(x,y)dx = 0$ for all $y$,
• 2) $L(x,y) \geq 0$ whenever $x \neq y$,

which respectively enforce the aforementioned constraints.

So, first thing’s first. Let’s read out $\dot{p}(x) = \int L(x,y) p(y) dy$. This says that our change in the probability density, $\dot{p}(x)$, is found by taking our function that tells us what the change in density is for system state $x$ given our current state $y$, which is $L(x,y)$, and weighting that by the probability of currently being in state $y$, which is $p(y)$, then summing that all up, which is the integral.

Now the constraints. The first constraint reads out as the integral of the changes of the probability density at each point $x$ for a given state $y$ must be equal to 0. This means that the area of the probability density over the states after updating them is the same. So, assuming we start out with a valid density whose sum equals 1, we always have a density whose sum equals 1.

The second constraint reads out as the change in probability density for state $x$ given a current state $y$ must be greater than zero whenever $x \neq y$. This means the only time that the change in the probability density can be negative is if there is a probability of being in that state; it enforces that all $p(x) \geq 0$, because $\dot{p}$ can’t be negative when $p(x)$ is zero.

Dr. Sanger defines the linear operators that satisfy these two conditions to be “differential stochastic operators”. The discrete time versions are matrices, dubbed “difference stochastic operators”.

Superposition of differential stochastic operators

Differential stochastic operators can be derived in different ways, here we’ll go through the derivation from the ‘master’ equation defining $p(\dot{x}|x)$, and from a stochastic differential equation. They each have their insights, so it’s helpful to work through both.

Derivation from master equation

The equation for $p(\dot{x}|x)$ written out above,

$p(\dot{x}|x) = p(\dot{x}|s=1)\;p(s=1|x) + p(\dot{x}|s=0)\;p(s=0|x)$,

determines the probability flow on the state. By employing the Kramers-Moyal expansion we can capture this probability flow through a partial differential equation describing change in probability density. In other words, instead of capturing the time evolution of the system state with a probability density over possible changes in state, $\dot{x}$, we capture it through the changing in probability of each state, $\dot{p}(x)$. The Kramers-Moyal expansion looks like:

$\frac{\partial}{\partial t}p(x,t) = -\frac{\partial}{\partial x}(D_1(x)p(x,t)) + \frac{1}{2}\frac{\partial^2}{\partial x^2}(D_1(x)p(x,t)) + ...$,

where $D_k(x) = E[(\dot{x})^k] = \int \dot{x}^k p(\dot{x}|x) d\dot{x}$. Truncating this expansion at the first two terms we get the Fokker-Planck equation, where the first term describes the drift of the density, and the second term the diffusion. These two terms are sufficient for describing Gaussian conditional densities, which capture many physical phenomena. In the case where $p(\dot{x}|x)$ does not have a Gaussian distribution, higher-order terms from the Kramers-Moyal expansion will need to be included.

Now, imagine we have a simple system of two neurons, where the change in state is defined as $\dot{x} = \dot{x}_1 + \dot{x}_2$. If these neurons have conditionally independent variability, i.e. $p(\dot{x}_1 \dot{x}_2 | x) = p(\dot{x}_1|x)p(\dot{x}_2|x)$, then we can sum the Kramers-Moyal expansion of each of these terms to describe the evolution of the overall system state:

$\frac{\partial}{\partial t}p(x,t) = - \sum_i \frac{\partial}{\partial x}(D_{1i}(x)p(x,t)) + \frac{1}{2} \sum_i \frac{\partial^2}{\partial x^2}(D_{2i}(x)p(x,t)) + ...$,

as long as the neurons have conditionally independent variability. This means that they can’t be connected (directly or indirectly) such that a spike in one neuron causes a spike in the other. While this might seem a poor assumption for modeling networks of spiking neurons, in large populations with many input, the effects of any single input neuron tends to be small enough that the assumption holds approximately.

We can rewrite the previous equation now, taking advantage of linearity of $p(x,t)$ and the Kramers-Moyal coefficients, to get

$\dot{p}(x,t) = \mathcal{L}p = \sum_i \mathcal{L}_i p$,

which means that by describing neurons with the differential stochastic operators, $\mathcal{L}_i$, we can determine the cumulative effect on the dynamics of the system through simple summation. Which we all remember from the last paper, but hey, good to review.

Now, in the event that we want to model the effect of a group of strongly interconnected neurons, we can instead consider the effect of the group as a function of the $2^n$ possible firing patterns (spike or no spike from each of the neurons). So where before $p(\dot{x}| x)$ was written in terms of the two cases $s = 0$ and $s = 1$, it would now be written:

$p(\dot{x}|x) = \sum_{i=1}^{2^n} p(\dot{x}|x,i)p(i)$,

where each $i$ is a different spike pattern. This group on neurons and their effect on the system dynamics is then considered as a single unit, and the differential stochastic operator describing them can then be combined with the operator from another group of neurons, provided that the two groups exhibit conditionally independent variability.

If there are no independent groups in the network then it’s fully connected and this is not for you go home.

Derivation from a stochastic differential equation

For this derivation, imagine a set of controllers operating in parallel. Each controller has a stochastic differential equation that defines how a control signal input affects the system dynamics,

$dx = f_i(x)dt + g_i(x)dB_i$,

where $f_i$ and $g_i$ are arbitrary equations and $dB_i$ are random functions of time, or noise terms. Let $f_i(x)dt$ for $i > 0$ be the controller equations, and $f_0(x)dt$ be the unforced, or passive, dynamics of the system, which define how the system behaves without controller input. We can write the equation for the whole system as

$dx = f_0(x)dt + \sum_{i>0}u_i(f_i(x)dt + g_i(x)dB_i)$,

where $u_i$ are constant or slowly varying control inputs. The reason we would choose to denote the unforced (passive) dynamics of the system as $f_0$ is because we can now define $u_0 = 1$, and rewrite the above equation as

$dx = \sum_{i}u_i(f_i(x)dt + g_i(x)dB_i)$.

The corresponding Fokker-Planck equation for the evolution of the state probability density is

$\frac{\partial}{\partial t}p(x,t) = - \sum_i u_i \frac{\partial}{\partial x}(f_i(x)p(x,t)) + \frac{1}{2} \sum_i u_i \frac{\partial^2}{\partial x^2}(g_i(x)p(x,t))$.

Look familiar? We can rewrite this as a superposition of linear operators

$\dot{p}(x,t) = \mathcal{L}p = \sum_i u_i \mathcal{L}_i p$,

and there you go.

Population model

So, now we can apply this superposition of differential stochastic equations to describe the effect of a population of neurons on a given system. Dr. Sanger lists several ways that this model can go about being controlled; 1) modifying the tuning curves of the neurons, which specifies how they respond to stimulus; 2) modify the output functions that determines the effect that a neuron has on the dynamics of the system; and 3) modifying the firing threshold of the neurons.

I found the difference between 1 and 3 can be a little confusing, so let’s look at an example tuning curve in 2D space to try to make this clear. Imagine a neuron sensitive to -dimensional input signals, and that it’s tuning curve looks like this:

If we’re changing the tuning curve, then how this neuron responds to its input stimulus will change. For example, in this diagram we show changing the tuning curve of a neuron:

Here, the neuron no longer responds to the same type of input that it responded to previously. We have made a qualitative change to the type of stimulus this neuron responds to.

If we change the firing threshold, however, then what we’re changing is when the neuron starts responding to stimulus that it is sensitive to.

Here we show the neuron becoming more and more sensitive to a its stimulus, respond stronger sooner and sooner. So the type of signal that the neuron responds to isn’t changing, but rather when the neuron starts responding.

Alright, now that we’ve got that sorted out, let’s move on.
Tuning curves (1) and output functions (2) are both modifiable through learning, by changing the incoming and outgoing connection weights, respectively, but for controlling systems on the fly this is going to be too slow, i.e. slower than the speed at which the system moves. So what’s left is (3), modifying the firing threshold of the neurons. So the model then looks like:

where $p(x)$ is projected in to a population of neurons, each with a stochastic differential operator that sum together to generate $\dot{p}(x)$. In this diagram, $\lambda_i$ is the firing threshold of neuron $i$, and $\lambda_i(x)$ denotes the modulation of the firing rate of neuron $i$ as a function of the current system state. When the modulation is dependent on the system state we have a feedback, or closed-loop, control system. Dr. Sanger notes that in the case that $\lambda_i$ is heavily dependent on $x$, modulating the firing threshold is indistinguishable from modifying the tuning curve, meaning that we can get some pretty powerful control out of this.

Conclusions

‘The theory of differential stochastic operators links the dynamics of individual neurons to the dynamics of a full neuro-mechanical system. The control system is a set of reflex elements whose gain is modulated in order to produce a desired dynamics of the overall system.’

This paper presents a very different formulation of control than classical control theory. Here, the goal is to modulate the dynamics of the system to closely match a desired set of dynamics that achieve some task, rather than to minimize the deviation from some prespecified trajectory through state space. Dr. Sanger notes that this description of control matches well to behavioral descriptions of neural control system, where there are numerous subsystems and circuits that have reflexive responses to external stimuli which must be modulated to achieve desired behavior. The goal of control is to fully define the dynamics of the system’s reaction to the environment.

What comes to mind first for me, in terms of using the modulation of reflex elements to effect a desired set of dynamics, is modeling the spinocerebellum. With a ton of projections directly to the spinal cord, and strong implications in locomotor and balance system, it seems like it would be a very good candidate for being modeled with this type of control. The idea being that the cerebellum is projecting modulatory values to the different spinal circuits (reflex networks and central pattern generators, for example) that specify how to respond to changes in the environment to maintain our balance or the rhythm of our walk. How we go about specifying exactly what those modulatory terms need to be is something that Dr. Sanger tackles in the last paper of this series, which I’ll be reviewing in the next couple of months. I’m looking forward to it.

On another note, in my lab we all work with the Neural Engineering Framework, in which populations of neurons are taken to represent vectors, and to perform transformations on these vectors to relay information an perform various functions. To this end, something that interests me about likelihood calculus is its application to populations of neurons representing vectors. Instead of finding $p(\dot{x}|x)$ by summing the effects of all of the neurons in a population, or defining it in terms of the population spiking patterns, we’re looking at it in terms of the different vectors this population can represent, and the effect of that vector on the system. So we can still have spiking neuron based models, but we can do all the likelihood calculus business one level removed, simplifying the calculation and reducing the number of differential stochastic operators needed.

There are a ton of things going on in this paper, and lots to think about. At several points I deviated from the notation used in this paper because I found it unclear, but aside from that I’ve really enjoyed reading through it and writing it up. Very interesting work.

Sanger TD (2010). Neuro-mechanical control using differential stochastic operators. Conference proceedings : … Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference, 2010, 4494-7 PMID: 21095779

## Online Goal Babbling – motor learning paper review

Recently I was linked to an article about learning how to control a highly complex arm from scratch: How infants tell us how to control the Bionic Handling Assistant. The work seemed very interesting so I went and pulled one of the papers linked in the article, Online Goal Babbling for rapid bootstrapping of inverse models in high dimensions.

Diving into that title, online means that we’re using information from every movement as it’s gathered to improve our control, as opposed to ‘batch’ where learning only occurs every so-many trials. Bootstrapping is the process of bringing a system up to a functionally useful level. High dimensions then refers to the complexity of the system being controlled, where every component that requires a control signal is another dimension. Humans, for example, require extremely high dimensional control signals. Inverse models refer to a type of internal model, which ‘describe relations between motor commands and their consequences’. Forward models predict the results of a movement, and inverse models allow suggest a motor command that can be used to achieve a desired consequence, such as ‘the ball falls on the floor’ or ‘my hand touches the red balloon’.

Alright, so that’s the title, let’s dive into the paper proper.

Online goal babbling

The idea behind this research is to let the system learn how to control itself by exploring the environment. The reason why you would want to do this is so that you don’t have to analytically specify how the system should move. Analytic solutions require accurate models of the system dynamics, and calculating these quickly become horrendously complex. To the point that the equations of motion for a relatively simple 3-link arm moving are pages upon pages upon pages. On top of this, because your solution is only is as good as your model and the model you just took all that time to calculate isn’t adaptive, if your system dynamics change at all through wear and tear or an environment change, you’re in big trouble. You can use feedback control to help compensate for the errors introduced, but you can only respond as fast as you can receive and process sensory signals, which is often too long in practical applications.

So this moves us to adaptive feedforward control, based on learning an inverse model of the system dynamics. Importantly, what this paper describes is a kinematic inverse controller, not a kinetic inverse controller; meaning that given a desired target position for the end effector (hand) of an arm it provides a sequence of joint angle sets that will lead to the end effector reaching the target, as opposed to providing a sequence of joint angle torques to drive the system to the target. At some point, a control system will have to generate force commands, rather than just specify what position it wants the system to be in. But, knowing what joint angles and trajectory the joint angles should follow is an important problem, because systems that are of interest to control tend to exhibit a fair amount of configuration redundancy between ‘task-space’ and the actual system state-space. Task-space being something like the 3-dimensional $(x,y,z)$ position of the end-effector, which we are interested in controlling, and the actual system state-space being something like the joint-angles or muscle-lengths. Configuration redundancy is the problem of more than one possible set of joint-angles putting the end-effector in the desired location. Often the number of potential solutions is incredibly large, think about the number of possible ways you can reach to an object. So how do you learn an appropriate trajectory for your joint angles during a reaching movement? That is the problem being addressed here.

To learn an inverse model from scratch, the system needs to explore. How should it explore? Moving randomly eventually leads to an exhaustive search, but this is a poor choice in complex systems because it takes a large amount of time, increasing exponentially with the degrees of freedom (i.e. number of joint angles) of the system. So let’s look to babies, they’re responsible for learning how to control a very complex system, how the heck do they learn to move?

‘Motor babbling’ is a term that was coined to describe the seemingly random way babies moved about. It was suggested that they just flail about without purpose until they gain some understanding of how their bodies work, at which point they can start moving with purpose. But! Baby movement was shown way back in the 80′s to in fact not be just random, but instead to be highly goal directed. And when they find a solution, they stick with it, they don’t worry about it being the best one as long as it gets the job done. Only later are movements tweaked to be more efficient. As mentioned above, in systems as complicated as the human body the task-space (i.e. position of the hand) is much smaller than the motor space (i.e. length of all muscles), and there are a bunch of different solutions to a given task. With all these different potential solutions to a given problem, an exhaustive search isn’t even close to being necessary. Also, if babies aren’t just randomly exploring space to figure things out, they don’t have to flip a switch somewhere in their brain that says “ok stop messing around and let’s move with purpose”.

This paper provides a technique for stable online inverse model learning that can be used from initial bootstrapping, and shows how online learning can even be highly beneficial for bootstrapping. So let’s dive into the Online Goal Babbling learning method.

The inverse model function

Let’s denote the inverse model function we’re learning as $g()$, joint angles as $q$, and end effector positions as $x$. Then to denote giving a desired end effector position and getting out a set of joint angles we write $g(x^*) = q$, where $x^*$ represents a target in end effector space.

We’re going to initialize the inverse model function by having every end effector position return some default resting state of the arm, or home position, that we’ve defined, $q_{home}$. Additionally, we’re going to throw on some exploratory noise to the answer provided by the inverse model, so that the joint angles to move to at time $t$ are defined as $q_t = g(x^*_t, \theta_t) + E_t(x_t^*)$, where $E_t(x^*_t)$ is the exploratory noise. When the system then moves to $q_t$ the end effector position, $x_t$, is observed and the parameters of the inverse model, $\theta$, are updated immediately.

To generate a smooth path from the starting position to the target, the system divides the distance up into a number of sub-targets (25 in the paper) for the system to hit along the way. This is an important point, as it’s how the inverse model function is used to create a path that represents the system moving; a series of targets are provided and the inverse model is queried “what should the joint configuration be for this position?”

As mentioned before, it is possible to have he end effector in the same position, but the joints in a different configuration. Learning across these examples is dangerous and can lead to instability in the system as neighbouring targets could be learned with very dissimilar joint configurations, preventing smooth movement through joint-space. To prevent this, observed information is weighted by a term $w_t$ as it is taken in, based on deviation from the resting position of the arm ($q_{home}$) and efficiency of end effector movement. What this leads to is a consistent solution to be converged upon throughout movement space, causing the inverse model to generate smooth, comfortable (i.e. not near joint limits, but near the resting state of the arm) movements.

Additions to movement commands

To recenter movement exploration every so often, and prevent the system from exploring heavily in some obscure joint-space, every time a new target to move to is selected there is some probability (.1 in the paper) that the system will return to $q_{home}$. To return home the system traces out a straight trajectory in joint-space, not worrying about how it is moving through end effector space. This return probability reinforces learning how to move well near $q_{home}$, and acts as a ‘developmentally plausible stabilizer that helps to stay on known areas of the sensorimotor space.’

Also, we mentioned adding exploratory noise. How is that noise generated? The noise is calculated through a small, randomly chosen linear function that varies slowly over time: $E_t(x^*_t) = A_t \cdot x^* + c_t$, where the entries to the matrix $A_0$ and vector $b_0$ are chosen independently from a normal distribution with zero mean and variance $\sigma^2$. To move, a set of small values is chosen from a normal distribution with variance significantly smaller than $\sigma^2$, and added to elements of $A_t$ and $c_t$. A normalization term is also used to keep the overall deviation stable around the standard deviation $\sigma$. And that’s how we get our slowly changing linear random noise function.

Online learning

To learn the inverse model function, we’re going to use the technique of creating a bunch of linear models that are accurate in a small area of state space, and weighting their contribution to the output based on how close we are to their ‘region of validity’. The specific method used in the paper is a local-linear map, from (H.Ritter, Learning with the self-organizing map 1991). We define our linear models as $g^{(k)}(x) = M^{(k)} \cdot x + b^{(k)}$, intuited as following the standard $mx + b$ definition of a line, but moving into multiple dimensions. $M^{(k)}$ is our linear transformation of the input for model $k$, and $b^{(k)}$ is the offset.

If an input is received that is outside of the region of validity for all of the local linear models, then another one is added to improve the approximation of the function at that point. New local models are initiated with the Jacobian matrix $J(x) = \frac{\partial g(x)}{\partial x}$ of the inverse estimate. In other words, we look at how the approximation of the function is changing as we move in this $x$ direction in state space, or estimate the derivative of the function, and use that to set the initial value of our new linear model.

To update our inverse model function, we fit it to the current example $(x_t, q_t)$ by reducing the weighted square error: $err = w_t \cdot ||q_t - g(x_t)||^2$. With this error, a standard gradient descent method is used to update the slopes and offsets of the dimensions of the linear models:

$M^{(k)}_{t+1} = M^{(k)}_t - \eta \frac{\partial err}{\partial M^{(k)}_t}, \;\;\;\;\; b^{(k)}_{t+1} = b^{(k)}_{t} - \eta \frac{\partial err}{/partial b^{(k)}_{t}}$,

where $\eta$ is the learning rate.

Results

So here are some figures of the results of applying the algorithm, and they’re pretty impressive. First we see learning the inverse model for a 5-link arm:

And one of the really neat things about this is the scalability of this algorithm. Here’s applying OGB to a 50-link arm:

And here are some results looking at varying the learning rate (x-axis) and the effects on (a) time until 10% error reached, (b) final performance error, and (c) final average distance from $q_{home}$, the resting state configuration.

I enjoyed reading this paper, although I thought the ‘Learning operational space control’ papers from Jan Peters and Stephen Schaal deserved a bit more of a shout-out, as they’ve been around since 2006/8 and it’s a very similar method. Also I pulled one of the older papers from the article, so I’ll need to review the other ones too soon, as there is clearly more going on for the successful control of the elephant trunk arm shown in the article.

Other than that, I’m a big fan of breaking up complicated, non-linear functions into a bunch of small linear models, and stitching them all back together to learn an accurate approximation. I think it’s a very likely candidate for the kind of way that the brain goes about learning these incredibly complicated models, especially in the cerebellum I hold these types of methods as a favorite for how biological systems accomplish this. Another important note is that the learning never ‘turns off’ here, which is another big plus to this method. It allows the system to converge upon a solution, but to keep exploring the space as it uses this solution, so more efficient solutions can be found. Additionally, this makes this system much more robust to changing dynamics, resultant of wear and tear or, as in the case of infants, growing.

Again, the model here is learning an kinematic control of the system, specifying what joint angles the system should move to, and not the joint torques that will be required to actually move the system there. But, I am partial to the idea that this kinematic inverse model does then make it a much more smooth function for the kinetic controller to learn from this point.

The way I picture it is specifying a target in end-effector space, learning an kinematic inverse model controller that can specify the appropriate joint trajectory to follow to reach this target. Then an kinetic inverse model controller that takes in the joint space path, looks at the current joint angles / velocities / and accelerations, and learns the appropriate torques or muscle activation commands to move the arm as desired. This is a much, much smoother function to learn than directly learning the appropriate joint torques for a desired target in end effector space, because there are no hidden configuration redundancies. It’s still complex, because it still has to compensate for the inertia and multi-link interaction dynamics, but should still hopefully be much more straight-forward to learn.

Rolf, Matthias. (2011). Online Goal Babbling for rapid bootstrapping of inverse models in high dimensions. Development and Learning (ICDL), 2011 IEEE DOI: 10.1109/DEVLRN.2011.6037368

## Controlling variability – Likelihood calculus paper series review part 1

Dr. Terry Sanger has a series of papers that have come out in the last few years describing what he has named ‘likelihood calculus’. The goal of these papers is to develop a ‘a theory of optimal control for variable, uncertain, and noisy systems that nevertheless accomplish real-world tasks reliably.’ The idea being that successful performance can be thought of as modulating variance of movement, allocating resources to tightly control motions when required and allowing variability in task-irrelevant dimensions. To perform variability modulation, we first need a means of capturing mathematically how the features of an uncertain controller operating affect variability in system movement. Defining terms quickly, the features of a controller are the different components that produce signals resulting in movement, variability is taken here to be the trial-to-trial variation in movements, and uncertainty means that the available sensory feedback does not uniquely determine the true state of the world, where uncertainty can arise from noise on sensory feedback signals, unmodeled dynamics, and/or quantitization of sensory feedback. To capture all this uncertainty and variability, probability theory will naturally be employed. In this post I will review the paper ‘Controlling variability’ (2010) by Dr. Sanger, which sets up the framework for describing the time course of uncertainty during movement.

Using probability in system representations

So, here’s a picture of the system our controller (the brain) is in:

There’s the input initial state $x$, and the output change in state, $\dot{x}$, which is generated as a combination of the unforced dynamics of the world and the control dynamics effected by the brain. But since we’re dealing with uncertainty and variability, we’re going to rewrite this such that given an initial state $x$, we get a probability distribution over potential changes in state, $p(\dot{x}|x)$, which specifies the likelihood of each change in state $\dot{x}$, given our initial state probability distribution $p(x)$. So in our system diagram, the word and the brain both define probability distributions over the possible changes in state, $p_1(\dot{x}|x)$ and $p_1(\dot{x}|x)$, respectively, which then combine to create the overall system dynamics $p(\dot{x}|x)$. Redrawing our picture to incorporate the probabilities, we get:

One may ask: how do these probabilities combine? Good question! What we’d like to be able to do is combine them through simple linear operators, because they afford us massive simplifications in our calculations, but the combination of $p_0(\dot{x}|x)$ and $p_1(\dot{x}|x)$ isn’t as simple as summing and normalizing. The reason why can be a little tricky to tease out if you’re unfamiliar with probability, but will becomes clear with some thought. Consider what it means to go about combining the probabilities in this way. Basically, if you sum and normalize, then the result is saying that there is a 50% chance of doing what the brain says to do, and a 50% chance of doing what the world says to do, and doesn’t capture what is actually going to happen, which is an interaction between the effects of the brain and the world. A good example for thinking about this is rolling dice. If you roll each die individually, you have an equal chance of the result being each number 1-6, but if you roll two dice, the overall system probability of rolling numbers changes in a highly nonlinear fashion:

Suddenly there is a 0% chance of the result being a 1, and the probability of rolling a number increases in likelihood until you get to 7, at which point the likelihood decreases with ascending numbers; there is a nonlinear interaction at play that can’t be captured by summing the probabilities of rolling a number on each die individually and normalizing.

So summing the probability distributions over the possible changes in state, $\dot{x}$, isn’t going to work, but is there another way to combine these through linear operators? The answer is yes, but it’s going to require us to undergo a bit of a paradigm shift and expand our minds. What if, instead of capturing the change in the system by looking at $p(\dot{x}|x)$, the probability distribution over possible changes in state given the current state, we instead capture the dynamics of the system by defining $\dot{p}(x)$, the change in the probability of states through time? Instead of describing how the system evolves through the likelihood of different state changes, the dynamics are captured by defining the change in likelihood of different states; we are capturing the effect of the brain and the world on the temporal evolution of state probability. Does that freak you out?

Reworking the problem

Hopefully you’re not too freaked out. Now that you’ve worked your head around this concept, let’s look at $\dot{p}(x)$ a little more closely. Specifically, let’s look at the Kramers-Moyal expansion (for a one-dimensional system):

$\frac{\partial p(x,t)}{\partial t} = \sum_{k=1}^{\inf} \left( - \frac{\partial}{\partial x} \right)^k \{a_k(x) p(x)\} / k!$,

$a_k(x) = \int \dot{x}^k p(\dot{x}|x) d\dot{x}$.

As Dr. Sanger notes, this is a daunting equation, but it can be understood relatively easily. The left side, $\frac{\partial p(x,t)}{\partial t} = \dot{p}_t(x)$, is the rate of change of probability at each point $x$ at time $t$. The right side is just the Taylor series expansion. If we take the first two terms of the Taylor series expansion, we get:

$\frac{\partial p(x,t)}{\partial t} = - a_1 \frac{\partial}{\partial x} p(x) + \frac{a_2}{2} \frac{\partial^2}{\partial x^2} p(x)$,

where the first describes how the probability drifts (or shifts / translates), $a_1$ being the average value of $\dot{x}$ for each value of $x$. The second term relates the rate of diffusion, $a_2$ being the second moment of the speed $\dot{x}$, describing the amount of spread in different possible speeds, where greater variability in speed leads to an increased spread of the probability. This is the Fokker-Planck equation, which describes the evolution of a physical process with constant drift and diffusion. At this point we make the assumption that our probability distributions are all going to be in the form of Gaussians (for which the Fokker-Planck equation exactly describes evolution of the system through time, and can arguably act as a good approximation to neural control systems where movement is based on the average activity of populations of neurons).

As an example of this, think of a 1-dimensional system, and a Gaussian probability distribution describing what state the system is likely to be in. The first term $a_1$ is the average rate of change $\dot{x}$ across the states $x$ the system could be in. The probability shifts through state space as specified by $a_1$. Intuitively, if you have a distribution with mean around position 1 and the system velocity is 4, then the change in your probability distribution, $p(x)$, should shift the mean to position 5. The second term, $a_2$ is a measure of how wide the range of different possible speeds $\dot{x}$ is. The larger the range of possible values, the less certain we become about the location of the system as it moves forward in time; the greater the range of possible states the system might end up in in the next time step. This is reflected by the rate of diffusion of the probability distribution. If we know for sure the speed the system moved at (i.e. all possible states will move with a specific $\dot{x}$), then we simply translate the mean of probability distribution. If however there’s uncertainty in the speed at which the system is moving, then the correct location (reflecting the actual system position) for the mean of the probability distribution could be one of a number of values. This is captured by increasing the width of (diffusing) the Gaussian.

Linear operators

Importantly, the equations above are linear in terms of $p(x)$. This means we can rearrange the above equation:

$\frac{\partial p(x,t)}{\partial t} = \left( -a_1 \frac{\partial}{\partial x} + \frac{a_2}{2} \frac{\partial^2}{\partial x^2} \right) p(x)$,

letting $\mathcal{L} = \left( -a_1 \frac{\partial}{\partial x} + \frac{a_2}{2} \frac{\partial^2}{\partial x^2} \right)$, we have

$\frac{\partial p(x,t)}{\partial t} = \mathcal{L} p(x)$.

Now we can redraw our system above as

where

$\mathcal{L} = \mathcal{L}_0 + \mathcal{L}_1$,

which is the straightforward combination of the different contributions of each of the brain and the world to the overall system state probability. How cool is that?

Alright, calm down. Time to look at using these operators. Let’s assume that the overall system dynamics $\mathcal{L}$ hold constant for some period of time (taking particular care to note that ‘constant dynamics’ does not mean that a single constant output is produced irrespective of the input state, but rather that a given input state $x$ always produces the same result while the dynamics are held constant), and we have discretized our representation of $x$ (to be a range of some values, i.e. -100 to 100) then we can find the state probability distribution at time $T$ by calculating

$p(x, T) = A^T p(x,0)$,

where $A^T = e^{T\mathcal{L}}$.

When combining these $\mathcal{L}$ operators, if we sum them, i.e. overall system dynamics

$\mathcal{L} = \mathcal{L}_0 + \mathcal{L}_1$,

and then apply them to the probability

$\frac{\partial p(x,t)}{\partial t} = \mathcal{L}p(x)$

this is saying that the dynamics provided by $\mathcal{L}_0$ and $\mathcal{L}_1$ are applied at the same time. But if we multiply the component dynamics operators,

$\mathcal{L} = \mathcal{L}_1 \mathcal{L}_0$

then when we apply them we have

$\frac{\partial p(x,t)}{\partial t} = \mathcal{L}p(x) = \mathcal{L}_1 \mathcal{L}_0 p(x)$,

which is interpreted as applying $\mathcal{L}_0$ to the system, and then applying $\mathcal{L}_1$. Just basic algebra, but it allows us to apply simultaneously and sequentially the dynamics generated by our contributing system components (i.e. the brain and the world).

Capturing the effects of control

So now we have a representation of the system dynamics operating with variability and under uncertainty, we’re talking about building a tool to use for controlling these systems though, so where does the control signal $u$ fit in? The $\mathcal{L}$ operator is made to be a function of the control signal, describing the probablistic effect of the control signal given the possible initial states in the current state probability distribution $p(x)$. Thus the change in state probability is now written

$\frac{\partial p(x,t)}{\partial t} = \mathcal{L}(u)p(x)$.

Suppose that we drive a system with constant dynamics $\mathcal{L}(u_1)$ for a period of time $T_1$, at which point we change the control signal and drive the system with constant dynamics $\mathcal{L}(u_2)$ for another period of time $T_2$. The state of the system now be calculated

$p(x, T_1 + T_2) = e^{T_2\mathcal{L}(u_2)}e^{T_1\mathcal{L}(u_1)}p(x,0) = A_{u_2}^{T_2}A_{u_1}^{T_1}p(x,0)$

using the sequential application of dynamics operators discussed above.

Conclusion

And that is the essence of the paper ‘Controlling variability’. There is an additional discussion about the relationship to Bayes’ rule, which I will save for another post, and an example, but this is plenty for this post.

The main point from this paper is that we shouldn’t be focusing on the values of states as the object of control, but rather the probability densities of states. By doing this, we can capture the uncertainty in systems and work towards devising an effecting means of control. So, although the paper is called ‘Controlling variability’, the discussion of how to actually control variability is saved for later papers. All the same, I thought this was a very interesting paper, enjoyed working through it, and am looking forward to the rest of the series.

Sanger TD (2010). Controlling variability. Journal of motor behavior, 42 (6), 401-7 PMID: 21184358

## Dynamic primitives of motor behavior – paper review

‘Dynamic primitives of motor behavior’ is a recent paper (2012) out by Neville Hogan and Dagmar Sternad.

This paper starts out professing the need for a theory of motor control that extends beyond a single task and situation, something near and dear to my heart. As they state, one of the problems with developing an encompassing theory is that most proposed theories are all seen as competing by the authors, slowing assimilation of ideas and development of an overarching structure. Laid out here, two of the most important features for any encompassing theory is that it accounts for broad classes of actions and addresses the major limitations of the human neuromuscular system, the highlighted limitation being the slow speed of efferent and afferent signals.

Synergies and dynamic primitives

The authors propose that human motor control is encoded solely in terms of primitive dynamic actions. This, as they point out, is definitely not a novel proposal, but they suggest that it and its implications haven’t been fully investigated. When people think of combining primitive actions, the idea of a synergy most often comes to mind, which refers to ‘steretyped patterns of simulataneous motion of multiple joints or simultaneous activation of multiple muscles that may simplify control’. The driving idea being that synergies could provide a means of dimensionality reduction for the controller, instead of having to fret over issuing every muscle activation signal, the controller modulates a set of larger actions, stitching them together simultaneously or sequentially. This making performing complex actions significantly simpler.

Taking this definition of a synergy, the authors say, is however insufficient for generating the complexity of behavior seen in humans: ‘this account of synergies constitutes an algebraic constraint, not a dynamic object. Even time-varying synergies are not dynamic objects, but constitute a kinematic constraint with time included as one of the variables related by the constraint’. That is to say, I believe, that this definition of a synergy is a strictly feedforward (open-loop), simplistic thing. It’s just a set of muscle activations that execute in a given order, without accounting for starting point, perturbations, or environment. In this sense they’re static, non-adapting to dynamic environments.

So, something more is needed. ‘Reducing the dimension of commands alone is not sufficient to account for how humans control complex dynamic objects. For that, the primitives of control should themselves be dynamic objects.’ At this point they bring in the discussion of dynamic primitives presented by the Schaal lab way back in the early 2000s, also known as the pre-Katy Perry era. The basic idea behind dynamic primitives is that there is a target point in state space that the system is drawn to (according to spring dynamics), and there is a time driven function that activates a forcing function which can move the system in interesting ways along its path to the target. Dynamic primitives can generate both discrete and rhythmic movements, ie they can act as point attractors or limit cycles. In the discrete case, the time driven function goes to zero, reducing the effect of the forcing function to zero, letting the default spring dynamics take over and pull the system to the target, guaranteeing convergence. In the rhythmic case, the time driven function goes to infinity and the cosine is taken to activate the forcing function in a rhythmic manner, where its movements are centered around the target in state space. There is a ton of really interesting and awesome things you can do with dynamic primitives, but that should be sufficient information for the discussion of this paper.

Taking dynamic primitives are used as our definition of synergies, we can generate significantly more robust behavior than with the alternate definition, because dynamic primitives are more than a sequence of muscle activations, they are attractors with dynamics that can guide the system in the face of perturbations and other noise. The authors term this property ‘”temporary permanence” (permanence due to robustness to perturbation; temporary because dynamic primitives, like phonemes of verbal communication, may have limited duration)’. Discrete and rhythmic dynamic primitives are then rewritten and termed ‘submovements’ and ‘oscillators’, respectively, that have an explicit mathematical summation and a speed profile with a single peak.

Virtual trajectories and mechanical impedance

The idea is then for the system to create a virtual trajectory to follow by combining these submovements and oscillators, creating a ‘trajectory attractor’. Although combinations of submovements and oscillators can account for a vast repertoire of movements in unconstrained environments, they’re not sufficient for describing any involving interactions with the environment. To do this, another aspect, mechanical impedance, is introduced as a feature to be accounted for in the construction of virtual trajectories. Mechanical impedance determines the force evoked by a displacement of a part of the system throughout the movement. In humans, mechanical impedance can be controlled by modulating the co-contraction of antagonist groups of muscles, holding a limb rigidly in place or letting it sway freely in response to applied outside force. The mechanical impedance for a given task then is a function that describes how to respond to outside forces throughout the time-course of a movement. Just like submovements and oscillations, mechanical impedances for different tasks can be combined through linear superposition to generate a novel function.

The incorporation of mechanical impedance into the specification of a movement has some really neat effects. The authors present several cases. One is the case of locomotion, instead of having two different primitive movements for different types of locomotion (such as walking normally and walking on balls of your feet, for example), these can both arise from the same composition of submovements and oscillators by varying the joint stiffness (mechanical impedance) profile throughout the movement. Another case is in reaching out to manipulate a door handle, one way to go about this is to have a precise model detailing the path to follow for the hand to appropriately close around the round object, and to adapt this over time to appropriately apply torque and open the door, but a simpler means is to have a crude model of the location of the door handle and its shape, and perform the movement with a low mechanical impedance, letting the hand form around the object appropriately as contact is made (this effectiveness of the latter method is also shown in simulation in the paper). Another point is that controlling mechanical impedance allows for feedforward specification, allowing appropriate reactions in situations where object contact is too fast for feedback based control to respond appropriately.

So, assuming we have a sets of our three classes of primitives, submovements, oscillators, and mechanical impedances, a virtual trajectory can be generated using the available primitives as basis functions which cann be combined through weighted summation. And the authors propose ‘that what is learned, encoded, and retrieved are the parameters of dynamic primitives, rather than any details of behavior’.

There were a number of interesting ideas in this paper, being already familiar with dynamical primitives and compositionality of movement (and already using both in my own work), the part I found most interesting was the incorporation of mechanical impedance as a movement feature for modulation. The door handle example being a particularly compelling example of the robustness and power this incorporation adds to movements. And a great bonus to using dynamical primitives as a basis for a motor control system is that some great work has been done incorporating learning into dynamical primitives (albeit not designed with any intent of neural plausibility), specifically the path integral policy improvement work done by Evangelous Theordorou during his time in the Schaal lab.

The terms ‘virtual trajectories’ and ‘trajectory attractor’ occur to me as a bit dangerous in their easy misintepretability, where they could seem more like a kinematic path specification method than the result of combining a number of dynamical attractor systems.

I assume that the reduced bandwidth required for the modulation of a set of primitives rather than the specification of a full control signal and the ability of these modulatory terms to specify appropriate responses to any encountered external force is the means of addressing to the transmission speed problem mentioned at the beginning of the paper. Learning the modulatory signals of a given set of primitives is another feature that appeals to me, as opposed to specifying the explicit muscle activations, because the former has the potential to take advantage of whatever built in circuitry is going on in the spinal cord, that very often ignored crazy complex structure that motor signals are routed through.

Finally, a recurring theme of the paper is also making the case for a overarching falsifiable theory that can be built up and revised incrementally, and isn’t thrown away when some experiment provides contradicting data. This is another call in the field lately, and the plan of attack I’ve been following myself. Maybe I could try directing them towards the Neural Optimal Control Hierarchy framework I’ve been building…

Hogan N, & Sternad D (2012). Dynamic primitives of motor behavior. Biological cybernetics, 106 (11-12), 727-39 PMID: 23124919

## Nengo model – Low pass derivative filter

To just get the code you can copy / paste from below or get the code from my github: low_pass_derivative_filter.py.

In the course of building models in Nengo, I recently came in to need for a neural implementation of a low pass derivative filter. I scripted up a sub-network in Nengo (www.nengo.ca) that does this, and until we get the model database / repository up and running for Nengo scripts I’ll keep working through building these things here, because it goes over some basic methods that can be useful when you’re building up your models.

Basically there are three parts to this model: Derivative calculation, absolute value calculation, and an inhibitory projection with a threshold activation that projects to the output population. Here’s a picture:
Here’s the idea: The population input projects both directly to the output population and to a population that calculates the derivative of the sum across dimensions of the input signal. The derivative population passes on the derivative of the input signal to an absolute value calculating population, which passes the absolute value of the derivative on to a population that isn’t activated for values under a threshold level. This threshold population then projects very strong inhibition to the output population, so that when the absolute value of the derivative is above the threshold level, no output is projected, and otherwise the system just relays the input signal straight through. Here’s the code:

def make_dlowpass(name, neurons, dimensions, radius=10, tau_inhib=0.005, inhib_scale=10):

dlowpass = nef.Network(name)

dlowpass.make('input', neurons=1, dimensions=dimensions, mode='direct') # create input relay
output = dlowpass.make('output', neurons=dimensions*neurons, dimensions=dimensions) # create output relay

# now we track the derivative of sum, and only let output relay the input
# if the derivative is below a given threshold
dlowpass.make('derivative', neurons=radius*neurons, dimensions=2, radius=radius) # create population to calculate the derivative
dlowpass.connect('derivative', 'derivative', index_pre=0, index_post=1, pstc=0.1) # set up recurrent connection
dlowpass.add(make_abs_val(name='abs_val', neurons=neurons, dimensions=1, intercept=(.2,1))) # create a subnetwork to calculate the absolute value

# connect it up!
dlowpass.connect('input', 'output') # set up communication channel
dlowpass.connect('input', 'derivative', index_post=0)

def sub(x):
return [x[0] - x[1]]
dlowpass.connect('derivative', 'abs_val.input', func=sub)

# set up inhibitory matrix
inhib_matrix = [[-inhib_scale]] * neurons * dimensions
output.addTermination('inhibition', inhib_matrix, tau_inhib, False)
dlowpass.connect('abs_val.output', output.getTermination('inhibition'))

return dlowpass.network


First off, this code is taking advantage of the absolute value function that was written a couple blog posts ago. You can either go check out that post or I’ll also have that function included in the code at the end for completeness. Aside from that, there are a lot of things going on that you won’t come across if you’re just coding up simple examples, so let’s look at them.

At the top, we’re assigning our output network a handle, which I try to avoid in general for neatness, since most of the times you can reference it by simply referring to it’s name, 'output'. The reason that I assign it a handle here is because we’re going to be calling upon some Java API features that (to my knowledge) aren’t handled in the Python API yet, and although we could call up the output node as a Java object with dlowpass.get('output') using only its assigned name, it will just be cleaner in the end to have a handle for it. We’ll come back to this.

The next interesting thing that happens, is that when we’re creating our derivative population, we set the number of neurons to 10*neurons, and radius=10. This is to because in the derivative we’re representing the sum of all of the input dimensions. How are all the input dimensions being summed up in derivative, you ask? Just below, on line 17. When we connect up the input relay to derivative we set index_post=0, which means that all of the input dimensions are going to to project to the same dimension of the derivative population. The default weight for this connection is 1, so then dimension 0 of derivative is equal to  1 * value for value in input_dimensions . Super.

But why do we set radius=10? This is because the radius parameter specifies the range of values represented by this population. The default is (-1,1), but when we specify radius, the new range of represented values becomes (-radius, radius). We’re making a bit of an assumption that this value won’t go outside of the range (-10,10) here, but that should be OK for most of the situations we’re going to come across. In the specific model I’m using this for it’s definitely the case, so that’s why I’ve set the default value to 10. And because we don’t want the accuracy in representation to decrease, we also scale up the number of neurons in this population by radius*neurons.

And there’s still more happening in this derivative population! On line 11 I specify a recurrent connection that projects into a second dimension represented in derivative. So now what’s going to happen is that the sum of the input signals is projected into the first dimension of derivative, and through a recurrent connection the value of the sum of the input signals from time t-pstc will be represented in the second dimension of the derivative population.

To calculate the derivative then, it’s a simple matter of subtracting the previous signal from the current signal, which is what happens in the function sub that I define on line 19. To implement this function, when connecting up derivative to abs_val, just set the parameter func=sub. Easy.

Now, when the abs_val population is made, we set and intercept=(.2,1). This is the same trick we used in the previous absolute value function model, but it’s acting as a threshold here. Basically, this population won’t respond if the value being projected into it is between (-.2, .2).

So, up to this point, what we have is a summation of the input dimensions, the derivative being calculated and passed to an absolute value function, and this population only responds if the derivative is greater than .2.

The last part is hooking up this abs_val population to the output relay, to suppress output whenever it’s activated (i.e. when the derivative is greater than .2). This is where we need to pull into the Java API, and this is why we specified a handle for our output population. In lines 25-26, what’s going on is that instead of using the NEF neural compiler functionality to set up our connection weights to compute some function, we’re specifying them ourselves. And we’re specifying them to prevent the neurons in output from firing. Now, the activation of the neurons in abs_val reduces the voltage values being sent into the output population, inhibiting their activity.

And that’s it! Here’s the complete code to run an example of this network (which can also be found on my github low_pass_derivative_filter.py):

import nef

# constants / parameter setup etc
N = 50 # number of neurons
D = 3 # number of dimensions

def make_abs_val(name, neurons, dimensions, intercept=[0]):
def mult_neg_one(x):
return x[0] * -1

abs_val = nef.Network(name)

abs_val.make('input', neurons=1, dimensions=dimensions, mode='direct') # create input relay
abs_val.make('output', neurons=1, dimensions=dimensions, mode='direct') # create output relay

for d in range(dimensions): # create a positive and negative population for each dimension in the input signal
abs_val.make('abs_pos%d'%d, neurons=neurons, dimensions=1, encoders=[[1]], intercept=intercept)
abs_val.make('abs_neg%d'%d, neurons=neurons, dimensions=1, encoders=[[-1]], intercept=intercept)

abs_val.connect('input', 'abs_pos%d'%d, index_pre=d)
abs_val.connect('input', 'abs_neg%d'%d, index_pre=d)

abs_val.connect('abs_pos%d'%d, 'output', index_post=d)
abs_val.connect('abs_neg%d'%d, 'output', index_post=d, func=mult_neg_one)

return abs_val.network

def make_dlowpass(name, neurons, dimensions, radius=10, tau_inhib=0.005, inhib_scale=10):

dlowpass = nef.Network(name)

dlowpass.make('input', neurons=1, dimensions=dimensions, mode='direct') # create input relay
output = dlowpass.make('output', neurons=dimensions*neurons, dimensions=dimensions) # create output relay

# now we track the derivative of sum, and only let output relay the input
# if the derivative is below a given threshold
dlowpass.make('derivative', neurons=radius*neurons, dimensions=2, radius=radius) # create population to calculate the derivative
dlowpass.connect('derivative', 'derivative', index_pre=0, index_post=1, pstc=0.1) # set up recurrent connection
dlowpass.add(make_abs_val(name='abs_val', neurons=neurons, dimensions=1, intercept=(.2,1))) # create a subnetwork to calculate the absolute value

# connect it up!
dlowpass.connect('input', 'output') # set up communication channel
dlowpass.connect('input', 'derivative', index_post=0)

def sub(x):
return [x[0] - x[1]]
dlowpass.connect('derivative', 'abs_val.input', func=sub)

# set up inhibitory matrix
inhib_matrix = [[-inhib_scale]] * neurons * dimensions
output.addTermination('inhibition', inhib_matrix, tau_inhib, False)
dlowpass.connect('abs_val.output', output.getTermination('inhibition'))

return dlowpass.network

# Create network
net = nef.Network('net')

# Create / add low pass derivative filter

# Make function input
net.make_input('input_function', values=[0]*D)

# Connect up function input to filter
net.connect('input_function', 'dlowpass.input')

# Add it all to Nengo