## Robot control part 1: Forward transformation matrices

I’m doing a tour of learning down at the Brains in Silicon lab run by Dr. Kwabena Boahen for the next month or so working on learning a bunch about building and controlling robots and some other things, and one of the super interesting things that I’m reading about is effective methods for force control of robots. I’ve mentioned operational space (or task space) control of robotic systems before, in the context of learning the inverse kinematic transformation, but down here the approach is to analytically derive the dynamics of the system (as opposed to learning them) and use these to explicitly calculate control signals to move in task space that take advantage of the passive dynamics of the system.

In case you don’t remember what those words mean, operational (or task) space refers to a different configuration space than the basic / default robot configuration space. FOR EXAMPLE: If we have a robot arm with three degrees of freedom (DOF), that looks something like this:

where two joints rotate, and are referred to as $q_1$, and $q_2$, respectively, then the most obvious representation for the system state, $\textbf{q}$, is the joint-angle space $\textbf{q} = [q_1, q_2]$. So that’s great, but often when we’re using this robot we’re going to be more interested in controlling the position of the end-effector rather than the angles of the joints. We would like to operate to complete our task in terms of $\textbf{q} = [x, y]$, where $x, y$ are the Cartesian coordinates of our hand in 2D space. So then $\textbf{q} = [x, y]$ is our operational (or task) space.

I also mentioned the phrase ‘passive dynamics’. It’s true, go back and check if you don’t believe me. Passive dynamics refer to how the system moves from a given initial condition when no control signal is applied. For example, passive dynamics incorporate the effects of gravity on the system. If we put our arm up in the air and remove any control signal, it falls down by our side. The reason that we’re interested in passive dynamics is because they’re movement for free. So if my goal is to move my arm to be down by my side, I want to take advantage of the fact that the system naturally moves there on it’s own simply by removing my control signal, rather than using a bunch of energy to force my arm to move down.

There are a bunch of steps leading up to building controllers that can plan trajectories in a task space. As a first step, it’s important that we characterize the relationship of each of reference coordinate frames of the robot’s links to the origin, or base, of the robot. The characterization of these relationships are done using what are called forward transformation matrices, and they will be the focus of the remainder of this post.

Forward transformation matrices in 2D

As I mentioned mere sentences ago, forward transformation matrices capture the relationship between the reference frames of different links of the robot. A good question might be ‘what is a reference frame?’ A reference frame is basically the point of view of each of the robotic links, where if you were an arm joint yourself what you would consider ‘looking forward’. To get a feel for these and why it’s necessary to be able to move between them, let’s look at the reference frames of each of the links from the above drawn robot:

We know that from $q_2$ our end-effector point $\textbf{p}$ is length $d_2$ away along its x-axis. Similarly, we know that $q_2$ is length $d_1$ away from $q_1$ along its x-axis, that $q_1$ is length $d_0$ away from the origin along its y-axis. The question is, then, in terms of the origin’s coordinate frame, where is our point $\textbf{p}?$

In this configuration pictured above it’s pretty straightforward to figure out, it’s simply $(x = d_2, y = d_0 + d_1)$. So you’re feeling pretty cocky, this stuff is easy. OK hotshot, what about NOW:

It’s not as straightforward once rotations start being introduced. So what we’re looking for is a method of automatically accounting for the rotations and translations of points between different coordinate frames, such that if we know the current angles of the robot joints and the relative positions of the coordinate frames we can quickly calculate the position of the point of interest in terms of the origin coordinate frame.

Accounting for rotation

So let’s just look quick at rotating axes. Here’s a picture:

The above image displays two frames of reference with the same origin rotated from each other by $q$ degrees. Imagine a point $\textbf{p} = (p_{x_1}, p_{y_1})$ specified in reference frame 1, to find its coordinates in terms of of the origin reference frame, or $(x_0, y_0)$ coordinates, it is necessary to find out the contributions of the $x_1$ and $y_1$ axes to the $x_0$ and $y_0$ axes. The contributions to the $x_0$ axis from $p_{x_1}$ are calculated

$cos(q) p_{x_1}.$

To include $p_{y_1}$‘s effect on position, we add the term

$cos(90 + q) p_{y_1},$

which is equivalent to $-cos(90 - q)$, as shown above. This term can be rewritten as $-sin(q)$ because $sin$ and $cos$ are phase shifted 90 degrees from one another.

The total contributions of a point defined in the $(x_1, y_1)$ axes to the $x_0$ axis are

$p_{0_x} = cos(q) p_{x_1} - sin(q) p_{y_1}.$

Similarly for the $y_0$ axis contributions we have

$p_{0_y} = sin(q) p_{x_1} + sin(90 - q) p_{y_1},$

$p_{0_y} = sin(q) p_{x_1} + cos(q) p_{y_1}.$

Rewriting the above equations in matrix form gives:

$^1_0\textbf{R} \; \textbf{p}_1 = \left[ \begin{array}{cc} cos(q_0) & -sin(q_0) \\ sin(q_0) & cos(q_0) \end{array} \right] \left[ \begin{array}{c} p_{x_1} \\ p_{y_1} \end{array} \right],$

where $^1_0\textbf{R}$ is called a rotation matrix.
The notation used here for these matrices is that the reference frame number being rotated from is denoted in the superscript before, and the reference frame being rotated into is in the subscript. $^1_0\textbf{R}$ denotes a rotation from reference frame 1 into reference frame 0 (using the same notation as described here.

To find the location of a point defined in reference frame 1 in reference frame 0 coordinates, we then multiply by the rotation matrix $^1_0\textbf{R}$.

Accounting for translation

Alrighty, rotation is great, but as you may have noticed our robots joints are not all right on top of each other. The second part of transformation is translation, and so it is also necessary to account for distances between reference frame origins.

Let’s look at the the reference frames 1 and 0 shown in the above figure, where point $\textbf{p} = (2,2)$ in reference frame 1. Reference frame 1 is rotated 45 degrees from and located at $(3, 2)$ in reference frame 0. To account for this translation and rotation a new matrix will be created that includes both rotation and translation. It is generated by appending distances, denoted $\textbf{D}$, to the rotation matrix $^1_0\textbf{R}$ along with a row of zeros ending in a 1 to get a transformation matrix:

$^1_0\textbf{T} = \left[ \begin{array}{cc} ^1_0\textbf{R} & ^1_0\textbf{D} \\ \textbf{0} & \textbf{1} \end{array} \right],$
$^1_0\textbf{T} = \left[ \begin{array}{ccc} cos(q_0) & -sin(q_0) & d_{x_0} \\ sin(q_0) & cos(q_0) & d_{y_0} \\ 0 & 0 & 1 \end{array} \right].$

To make the matrix-vector multiplications work out, a homogeneous representation must be used, which adds an extra row with a 1 to the end of the vector $\textbf{p}$ to give

$\textbf{p} = \left[ \begin{array}{c} p_x \\ p_y \\ 1 \end{array} \right].$

When position vector $\textbf{p}$ is multiplied by the transformation matrix $^1_0\textbf{T}$ the answer should be somewhere around $(3, 5)$ from visual inspection, and indeed:

$^1_0\textbf{T} \; \textbf{p} = \left[ \begin{array}{ccc} cos(45) & -sin(45) & 3 \\ sin(45) & cos(45) & 2 \\ 0 & 0 & 1 \end{array} \right] \left[ \begin{array}{c} 2 \\ 2 \\ 1 \end{array} \right] = \left[ \begin{array}{c} 3 \\ 4.8285 \\ 1 \end{array} \right].$

To get the coordinates of $\textbf{p}$ in reference frame 0 now simply take the first two elements of the resulting vector $\textbf{p} = (3, 4.8285)$.

Applying multiple transformations

We can also string these things together! What if we have a 3 link, planar (i.e. rotating on the $(x,y)$ plane) robot arm? A setup like this:

We know that our end-effector is at point $(1,2)$ in reference frame 2, which is at an 80 degree angle from reference frame 1 and located at $(x_1 = 2.5, y_1 = 4)$. That gives us a transformation matrix

$^2_1\textbf{T} = \left[ \begin{array}{ccc} cos(80) & -sin(80) & 2.5 \\ sin(80) & cos(80) & 4 \\ 0 & 0 & 1 \end{array} \right]$.

To get our point in terms of reference frame 0 we account for the transform from reference frame 1 into reference frame 2 with $^2_1\textbf{T}$ and then account for the transform from reference frame 0 into reference frame 1 with our previously defined transformation matrix

$^1_0\textbf{T} = \left[ \begin{array}{ccc} cos(45) & -sin(45) & 3 \\ sin(45) & cos(45) & 2 \\ 0 & 0 & 1 \end{array} \right]$.

So let’s give it a shot! By eyeballing it we should expect our answer to be somewhere around $(7,0)$ or so, I would say.

$\textbf{p}_0 = ^1_0\textbf{T} \; ^2_1\textbf{T} \; \textbf{p}_2 = \\ \\ \left[ \begin{array}{ccc} cos(45) & -sin(45) & 3 \\ sin(45) & cos(45) & 2 \\ 0 & 0 & 1 \end{array} \right] \left[ \begin{array}{ccc} cos(80) & -sin(80) & 2.5 \\ sin(80) & cos(80) & 4 \\ 0 & 0 & 1 \end{array} \right] \left[ \begin{array}{c} 1 \\ 2 \\ 1 \end{array} \right] = \left[ \begin{array}{c} -0.273 \\ 6.268 \\ 1 \end{array} \right]$.

And it’s a good thing we didn’t just eyeball it! Accurate drawings might have helped but the math gives us an exact answer. Super!

And one more note, if we’re often performing this computation, then instead of performing 2 matrix multiplications every time we can work out

$^2_0\textbf{T} = ^1_0\textbf{T} \; ^2_1\textbf{T}$

and simply multiply our point in reference frame 2 by this new transformation matrix $^2_0\textbf{T}$ to calculate the coordinates in reference frame 0.

Forward transform matrices in 3D

The example here is taken from Samir Menon’s RPP control tutorial.

It turns out it’s trivial to add in the $z$ dimension and start accounting for 3D transformations. Let’s say we have a standard revolute-prismatic-prismatic robot, which looks exactly like this, or roughly like this:

where the base rotates around the $z$ axis, and the distance from reference frame 0 to reference frame 1 is 1 unit, also along the $z$ axis. The rotation matrix from reference frame 0 to reference frame 1 is:

$^1_0\textbf{R} = \left[ \begin{array}{ccc} cos(q_0) & -sin(q_0) & 0 \\ sin(q_0) & cos(q_0) & 0 \\ 0 & 0 & 1 \end{array} \right]$

and the translation vector is

$^1_0\textbf{D} = [0, 0, 1]^T.$

The transformation matrix from reference frame 0 to reference frame 1 is then:

$^1_0\textbf{T} = \left[ \begin{array}{cccc} cos(q_0) & -sin(q_0) & 0 & 0 \\ sin(q_0) & cos(q_0) & 0 & 0 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 \end{array} \right],$

where the third column indicates that there was no rotation around the $z$ axis in moving between reference frames, and the forth (translation) column shows that we move 1 unit along the $z$ axis. The fourth row is again then only present to make the multiplications work out and provides no information.

For transformation from the reference frame 1 to reference frame 2, there is no rotation (because it is a prismatic joint), and there is translation along the $y$ axis of reference frame 1 equal to $.5 + q_1$. This gives a transformation matrix:

$^2_1\textbf{T} = \left[ \begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0.5 + q_1 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array} \right].$

The final transformation, from the origin of reference frame 2 to the end-effector position is similarly another transformation with no rotation (because this joint is also prismatic), that translates along the $z$ axis:

$^{ee}_2\textbf{T} = \left[ \begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & -0.2 - q_2 \\ 0 & 0 & 0 & 1 \end{array} \right].$

The full transformation from reference frame 0 to the end-effector is found by combining all of the above transformation matrices:

$^{ee}_0\textbf{T} = ^1_0\textbf{T} \; ^2_1\textbf{T} \; ^{ee}_2\textbf{T} = \left[ \begin{array}{cccc} cos(q_0) & -sin(q_0) & 0 & -sin(q_0)(0.5 + q_1) \\ sin(q_0) & cos(q_0) & 0 & cos(q_0) (0.5 + q_1) \\ 0 & 0 & 1 & 0.8 - q_2 \\ 0 & 0 & 0 & 1 \end{array} \right].$

To transform a point from the end-effector reference frame into terms of the origin reference frame, simply multiply the transformation matrix by the point of interest relative to the end-effector. If it is the end-effector position that is of interest to us, $p = [0, 0, 0, 1]^T$. For example, let $q_0 = \frac{\pi}{3}$, $q_1 = .3$, and $q_2 = .4$, then the end-effector location is:

$^{ee}_0\textbf{T} \; \textbf{p} = \left[ \begin{array}{cccc} cos(q_0) & -sin(q_0) & 0 & -sin(q_0)(0.5 + q_1) \\ sin(q_0) & cos(q_0) & 0 & cos(q_0) (0.5 + q_1) \\ 0 & 0 & 1 & .8 + q_2 \\ 0 & 0 & 0 & 1 \end{array} \right] \; \left[ \begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \end{array} \right] = \left[ \begin{array}{c} -0.693 \\ 0.4 \\ 0.8 \\ 1 \end{array} \right].$

Inverting our transformation matrices

What if we know where a point is defined in reference frame 0, but we want to know where it is relative to our end-effector’s reference frame? Fortunately this is straightforward thanks to the way that we’ve defined our transform matrices. Continuing the same robot example and configuration as above, and denoting the rotation part of the transform matrix $\textbf{R}$ and the translation part $\textbf{D}$, the inverse transform is defined:

$(^{ee}_0\textbf{T})^{-1} = \left[ \begin{array}{cc} (^{ee}_0\textbf{R})^T & -(^{ee}_0\textbf{R})^T \; ^{ee}_0\textbf{D} \\ 0 & 1 \end{array} \right].$

If we have a point that’s at $\textbf{p}_0 = [1, 1, .5, 1]^T$ in reference frame 0, then we can calculate that relative to the end-effector it is at:

$\textbf{p} = (^{ee}_0\textbf{T})^{-1} \; \textbf{p}_0 = [1.37, -1.17, -0.3, 1]^T.$

Conclusions

These are, of course, just the basics with forward transformation matrices. There are numerous ways to go about this, but this method is fairly straightforward. If you’re interested in more, there are a bunch of youtube videos and detailed tutorials all over the web. There’s a bunch of neat stuff about why the matrices are set up like they are (search: homogeneous transformations) and more complex examples.

The robot example for the 3D case here didn’t have any spherical joints, each joint only moved in 2 dimensions, but it is also possible to derive the forward transformation matrix in this case, it’s just more complex and not necessary to move onward here since they’re not used in the robots I’ll be looking at. This introduction is enough to get started and move on to some more exciting things, so let’s do that!

## Reinforcement Learning part 2: SARSA vs Q-learning

In my previous post about reinforcement learning I talked about Q-learning, and how that works in the context of a cat vs mouse game. I mentioned in this post that there are a number of other methods of reinforcement learning aside from Q-learning, and today I’ll talk about another one of them: SARSA. All the code used is from Terry Stewart’s RL code repository, and can be found both there and in a minimalist version on my own github: SARSA vs Qlearn cliff. To run the code, simply execute the cliff_Q or the cliff_S files.

SARSA stands for State-Action-Reward-State-Action. In SARSA, the agent starts in state 1, performs action 1, and gets a reward (reward 1). Now, it’s in state 2 and performs another action (action 2) and gets the reward from this state (reward 2) before it goes back and updates the value of action 1 performed in state 1. In contrast, in Q-learning the agent starts in state 1, performs action 1 and gets a reward (reward 1), and then looks and sees what the maximum possible reward for an action is in state 2, and uses that to update the action value of performing action 1 in state 1. So the difference is in the way the future reward is found. In Q-learning it’s simply the highest possible action that can be taken from state 2, and in SARSA it’s the value of the actual action that was taken.

This means that SARSA takes into account the control policy by which the agent is moving, and incorporates that into its update of action values, where Q-learning simply assumes that an optimal policy is being followed. This difference can be a little difficult conceptually to tease out at first but with an example will hopefully become clear.

Mouse vs cliff

Let’s look at a simple scenario, a mouse is trying to get to a piece of cheese. Additionally, there is a cliff in the map that must be avoided, or the mouse falls, gets a negative reward, and has to start back at the beginning. The simulation looks something like exactly like this:

where the black is the edge of the map (walls), the red is the cliff area, the blue is the mouse and the green is the cheese. As mentioned and linked to above, the code for all of these examples can be found on my github (as a side note: when using the github code remember that you can press the page-up and page-down buttons to speed up and slow down the rate of simulation!)

Now, as we all remember, in the basic Q-learning control policy the action to take is chosen by having the highest action value. However, there is also a chance that some random action will be chosen; this is the built-in exploration mechanism of the agent. This means that even if we see this scenario:

There is a chance that that mouse is going to say ‘yes I see the best move, but…the hell with it’ and jump over the edge! All in the name of exploration. This becomes a problem, because if the mouse was following an optimal control strategy, it would simply run right along the edge of the cliff all the way over to the cheese and grab it. Q-learning assumes that the mouse is following the optimal control strategy, so the action values will converge such that the best path is along the cliff. Here’s an animation of the result of running the Q-learning code for a long time:

The solution that the mouse ends up with is running along the edge of the cliff and occasionally jumping off and plummeting to its death.

However, if the agent’s actual control strategy is taken into account when learning, something very different happens. Here is the result of the mouse learning to find its way to the cheese using SARSA:

Why, that’s much better! The mouse has learned that from time to time it does really foolish things, so the best path is not to run along the edge of the cliff straight to the cheese but to get far away from the cliff and then work its way over safely. As you can see, even if a random action is chosen there is little chance of it resulting in death.

Learning action values with SARSA

So now we know how SARSA determines it’s updates to the action values. It’s a very minor difference between the SARSA and Q-learning implementations, but it causes a profound effect.

Here is the Q-learning learn method:

def learn(self, state1, action1, reward, state2):
maxqnew = max([self.getQ(state2, a) for a in self.actions])
self.learnQ(state1, action1,
reward, reward + self.gamma*maxqnew)


And here is the SARSA learn method

def learn(self, state1, action1, reward, state2, action2):
qnext = self.getQ(state2, action2)
self.learnQ(state1, action1,
reward, reward + self.gamma * qnext)


As we can see, the SARSA method takes another parameter, action2, which is the action that was taken by the agent from the second state. This allows the agent to explicitly find the future reward value, qnext, that followed, rather than assuming that the optimal action will be taken and that the largest reward, maxqnew, resulted.

Written out, the Q-learning update policy is Q(s, a) = reward(s) + alpha * max(Q(s')), and the SARSA update policy is Q(s, a) = reward(s) + alpha * Q(s', a'). This is how SARSA is able to take into account the control policy of the agent during learning. It means that information needs to be stored longer before the action values can be updated, but also means that our mouse is going to jump off a cliff much less frequently, which we can probably all agree is a good thing.

## Wrapping MapleSim C code for Python

Previously, I worked through a simple inverse kinematics example with at three link arm. The arm, however, was uninteresting, and in fact had no dynamics of its own. I’ve also previously (but not in a blog post) built up a number of arm models using MapleSim that were interesting, and ported them into C. From there I used them in Matlab / Simulink, but I’m on a Python kick now and tired of Matlab; so I’ve been looking at how to wrap this C code generated by MapleSim using Cython such that I can interact with it from Python.

So the code that simulates my models is exported to C, which is great in that it’s super fast, but it’s bad because it’s a pain in the ass to work with. I don’t want to have to write my controllers and learning algorithms all in C, it would be much nicer to do it all in Python. So I was looking at interfacing Python with the C library, and did a little (as seen in my last post), but then I already know how to make Cython play nice with C++ so I figured let’s shoot for that instead. What follows now is a hacky guide to getting your MapleSim model simulations to work with Python. First we’ll get the C code to just run, then we’ll port it to Python, then we’ll get it going graphically.

Get the MapleSim code to run in C++ with classes and input

Step 1: Rename your file from ‘whatever.c’ to ‘whatever.cpp’, in the code here I’m renaming ‘c2LinkArm.c’ to ‘2LinkArm.cpp’. I know, I know, it’s painful but hopefully that’s the worst of it.

Step 2: Add #include "mplshlib.h" to the beginning of the file. The code is meant to be run in a MapleSim environment (or something), so it requires one of the Maple library files. We’ll just add this in here and everything should be happy.

Step 3: For whatever reason, there is a function in here that sets all of the system input to 0. Why, you ask? Good question. The function in question is static void inpfn(double T, double *U), and you can see that all it does is set every element of U = 0. Now, you can either comment out the body of this function, or, several lines down, the first line of the SolverUpdate function (after a long i variable declaration) is the call to this function, so you can comment that out and then everything’s fine. Hurray! We can give input to the system now!

And that’s all the inline code editing we have to do. We’re not done with this file though, we still need to append some things so that we can use it easily.

Step 4: The reason that I want to use C++ is because with C++ we can make a class, which we’ll call the Sim class, that can store all of our simulation details, and hold some wrapper functions to hide the full blown interface to the simulator functions. Let’s make that class now. Go down to the bottom of the file and paste in the following code:

/*  A class to contain all the information that needs to
be passed around between these functions, and can
encapsulate it and hide it from the Python interface.

Written by Travis DeWolf (May, 2013)
*/
class Sim {
/* Very simple class, just stores the variables we
need for simulation, and has 2 functions. Reset
resets the state of the simulation, and step steps it
forward. Tautology ftw!*/

double* params;
double dt, t0;
double u0[NINP], other_out[NOUT+1], y[NOUT];
double w[1 + 2 * NEQ + NPAR + NDFA + NEVT];

SolverStruct S;

public:
Sim(double dt_val, double* params_pointer);
void reset(double* out, double* ic);
void step(double* out, double* u);
};

Sim::Sim(double dt_val, double* params_pointer)
{
t0 = 0.0; // set up start time
dt = dt_val; // set time step
for (int i = 0; i < NINP; i++)
u0[i] = 0.0; // initial control signal

params = params_pointer; // set up parameters reference

/* Setup */
S.w = w;
S.err = 0;
}

void Sim::reset(double* out, double* ic)
{
SolverSetup(t0, ic, u0, params, y, dt, &S);

/* Output */
out[0] = t0;
for(int j = 0; j < NOUT; j++)
out[j + 1] = y[j];
}

void Sim::step(double* out, double* u)
/* u: control signal */
{
for (int k = 0; k < NOUT; k++)
out[k] = *dsn_undef; // clear values to nan

/* Integration loop */
/* Take a step with states */
EulerStep(u, &S);

if (S.err <= 0)
{
/* Output */
SolverOutputs(y, &S);

out[0] = S.w[0];
for(long j = 0; j < NOUT; j++)
out[j + 1] = y[j];
}
}

int main (void)
{
FILE *fd;

double *ic, *p, *out;
char* errbuf;
long i, j, outd;
long internal = 0;

double dt = 0.00001;

int time_steps = 1000000;
double u[NINP];
for (int k = 0; k < NINP; k++) u[k] = .1;

fd = fopen("output.dat", "w");

Sim sim = Sim(dt, NULL);
sim.reset(out, NULL); // ic = NULL, use default start state

for(i=0;i<time_steps;i++)
{
sim.step(out, u);
fprintf(fd,"%lf ",out[0]);
for(j=0;j<NOUT;j++)
{
fprintf(fd,"%lf ",out[j+1]);
}
fprintf(fd, "\n");
}

fclose(fd);

return 0;
}


So, this is based off of the ParamDriverC() function from the MapleSim generated code. I just separated out the initialization and the simulation into two functions (reset() and step(), respectively), so that it’s possible to explicitly control the simulation stepping through time, and change the input control signal at every time step.

NOTE: The dt here must be <= 1e-5, or the constraint solver in the code throws an error. Don’t fear though, this simulator is still crazy fast.

OK. This code then can be compiled (once you have the library files in the same folder) and run with:

g++ c2LinkArm.cpp -out sim
./sim


You may have noticed a bunch of warning messages scrolling by, I’m not about to touch those and they don’t affect the simulation, so as long as one saying ‘error’ doesn’t pop up let’s just leave well enough alone.

Once you’ve run the executable you should now have a file called ‘output.dat’ filled with simulation data. The first column is time, and the rest are the output variables from your MapleSim model. Alright, we’ve got our code up and running! Let’s go an and make a Cython wrapper for it.

Making a Cython wrapper for the MapleSim code
The first thing that we need to do is take the main function we just added in out of the simulation code above. We’re done with running the C code on its own so we don’t need it anymore. Get rid of it already.

Now, this is going to be very similar to the code from the previous Cython posts, but in this one there’s no getting around it: We need to pass back and forth arrays. It turns out this isn’t very complicated, but there’s a template you have to follow and it can be difficult if you don’t know what that is.

I’ll post the code and then we can go through the tricky parts. As always, we create a ‘pyx’ file for our Cython wrapper code, I called this one ‘py2LinkArm.pyx’.

import numpy as np
cimport numpy as np

cdef cppclass Sim:
Sim(double dt, double* params)
void reset(double* out, double* ic)
void step(double* out, double* u)

cdef class pySim:
cdef Sim* thisptr
def __cinit__(self, double dt = .00001,
np.ndarray[double, mode="c"] params=None):
"""
param float dt: simulation timestep, must be < 1e-5
param array params: MapleSim model internal parameters
"""
if params: self.thisptr = new Sim(dt, &params[0])
else: self.thisptr = new Sim(dt, NULL)

def __dealloc__(self):
del self.thisptr

def reset(self, np.ndarray[double, mode="c"] out,
np.ndarray[double, mode="c"] ic=None):
"""
Reset the state of the simulation.
param np.ndarray out: where to store the system output
NOTE: output is of form [time, output]
param np.ndarray ic: the initial conditions of the system
"""
if ic: self.thisptr.reset(&out[0], &ic[0])
else: self.thisptr.reset(&out[0], NULL)

def step(self, np.ndarray[double, mode="c"] out,
np.ndarray[double, mode="c"] u):
"""
Step the simulation forward one timestep.
param np.ndarray out: where to store the system output
NOTE: output is of form [time, output]
param np.ndarray u: the control signal
"""
self.thisptr.step(&out[0], &u[0])


This is pretty familiar: We make a reference to a class defined in C++, and then we wrap it in a python class and define functions that we can actually call from Python.

The main difference here is that we have to handle passing in numpy arrays, and having them be in the appropriate double* form that our C++ code is expecting to see. There are three parts involved in this. First, note at the top that we have an import numpy as np followed by a cimport numpy as np, this gives us access to the functions that we’ll need. Second, for each of the arrays accepted as parameter we have a special type declaration: np.ndarray[double, mode="c"] var. And finally, we pass the arrays on with a dereferencing &var[0]. If we were passing along 2D arrays we would use &var[0][0].

In the __init__() and reset() methods, I’m allowing the model parameters, params, and initial conditions, ic, arrays to be undefined, in which case NULL is passed to our C code and the default values we defined inside MapleSim are used. The reason for this is that these require some pretty intimate knowledge of the MapleSim model, and if a casual user wants to just use this business they shouldn’t have to know anything about the internal states.

With our wrapper code done now the only thing left is our setup file! This one is just straight from the previous posts, with the exception that only the ‘pyx’ file is included in the sources declaration because our ‘py2LinkArm.pyx’ file references the ‘c2LinkArm.cpp’ file directly, instead of referencing a header file.
setup.py

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

setup(
cmdclass = {'build_ext': build_ext},
language="c++"),],
)


With our setup file we should now be good to go. Let’s compile! To do this I pull open IPython and punch in

run setup.py build_ext -i


And it will complain and throw some warnings, but it should compile just fine. And now we have access to the pySim class!
Note: that should read ‘build_ext’ with an underscore in there, I’m not sure why it’s not displaying but that underscore needs to be there or it won’t run.
To see it do some things, type in

import numpy as np

u = np.ones(2) # input array
out = np.zeros(3) # output array

sim.reset(out)
print out # initial values

for i in range(10):
sim.step(out, u)
print out


And we can see it moving forward! If you set u = np.ones(2) * .1, which is what it was when we generated the ‘output.dat’ file using the C code only and run it forward you’ll see that the we get the same results for both. Excellent. OK now let’s display it!

Running and plotting the MapleSim arm model
In a previous post I used Pyglet to plot out a 3 link arm (a very poor, lines only plot, but sufficient), and I’m going to use it again here (for a very poor, lines only plot). The file is called ‘run.py’, and is less than 80 lines. I’ll post it and then walk through it below:
run.py

import numpy as np
import pyglet
import time

def run():

# set up input control signal to simulation
u = np.ones(2) * .1
# set up array to receive output from simulation
out = np.zeros(3)

# set up initial conditions of the system
sim.reset(out)

###### Plotting things

# segment lengths = (.31, .27)m (defined in MapleSim)
L = np.array([31, 27]) * 3

# make our window for drawin'
window = pyglet.window.Window()

# set up some labels to display time
label_time = pyglet.text.Label('time', font_name='Times New Roman',
font_size=36, x=window.width//2, y=window.height - 50,
anchor_x='center', anchor_y='center')
# and joint angles
label_values = pyglet.text.Label('values', font_name='Times New Roman',
font_size=36, x=window.width//2, y=window.height - 100,
anchor_x='center', anchor_y='center')

def update_arm(dt):
""" Simulate the arm ahead a chunk of time, then find the
(x,y) coordinates of each joint, and update labels."""

# simulate arm forward 15ms
for i in range(1500):
sim.step(out, u)

# find new joint (x,y) coordinates, offset to center of screen-ish
x = np.array([ 0,
L[0]*np.cos(out[1]),
L[0]*np.cos(out[1]) + L[1]*np.cos(out[1]+out[2])]) + window.width/2
y = np.array([ 0,
L[0]*np.sin(out[1]),
L[0]*np.sin(out[1]) + L[1]*np.sin(out[1]+out[2])]) + 100

# update line endpoint positions for drawing
window.jps = np.array([x, y]).astype('int')
label_time.text = 'time: %.3f'%out[0]
label_values.text = '(01, 02) = (%.2f, %.2f)'%(out[1], out[2])

update_arm(0) # call once to set up

@window.event
def on_draw():
window.clear()
label_time.draw()
label_values.draw()
for i in range(2):
pyglet.graphics.draw(2, pyglet.gl.GL_LINES, ('v2i',
(window.jps[0][i], window.jps[1][i],
window.jps[0][i+1], window.jps[1][i+1])))

# set up update_arm function to be called super often
pyglet.clock.schedule_interval(update_arm, 0.00001)
pyglet.app.run()

run()


Alright. In the first part, we’re just doing the same as above when we imported our brand new module and set it up for simulation.
What’s different is that we’ve now pushed the actual simulation into a method called update_arm. The idea is that the drawing is significantly slower than the simulation, so we’ll simulate a bunch, then update the plots once, then repeat, and hopefully get something that simulates at a decent speed. Once we have the joint angles of the arm stored safely in our out variable, we calculate the (x,y) locations of the elbow and wrist and update the system. Then the last thing this method does is update the text of the labels in the window.

In the on_draw method we’ve overwritten, we clear the window, then draw our labels and the lines corresponding to upper and lower arm segments.

Then the last thing is to schedule our update_arm method to be called periodically, in this case in extremely often.

When you run this file, you should see something that looks like:

You’ll have to trust me it looks better when it’s moving, and when it’s moving it will look better when the input isn’t constant.

But there you go! Now you are able to take your very own MapleSim model’s optimized C code, append some C++ code to make a Sim class, wrap it in Cython, and access it as a Python module! Why would anyone ever want to go through this trouble? Well, finding good simulations with appropriate dynamics can be a huge pain in the ass, and sometimes you just want something stand-alone. Also, now that this is done, this exact same code should be good for any MapleSim model, so hopefully the process is relatively straightforward. At least that’s my dream.

The code that I’ve used in this post can be found at my github: 2LinkArm.

This was all written for compiling on Ubuntu, my labmate and pal Xuan Choo also went through getting this all to work on Windows too, here are his notes!

Step 1: Get and install cython

Step 2: Follow the instructions here. Essentially:

• 2b: Start the windows SDK command shell (start menu->programs->windows sdk->CMD shell)
• 2c: Type these things into the cmd shell:
set DISTUTILS_USE_SDK=1
setenv /x64 /release

• 2d: in the same cmd shell, navigate to the folder with the pyx file you want to compile, and run:
python setup.py build_ext -i

• 2d*: If it complains about undefined numpy references, add “include_dirs=[numpy.get_include()]” to the Extension class call (also import numpy in setup.py).
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
import numpy

setup(
cmdclass = {'build_ext': build_ext},
language="c++",
include_dirs=[numpy.get_include()]),],
)


Done!

## Cython Journey Part 3: Wrapping C code and passing arrays

Basic C interfacing example

OK, so this code is taken straight from an example here, I’m just going to explicitly work through all the steps so that when I try to do this again later I don’t have to look everything up again.

Let’s say we have some c code that we want to work with from Python, in a file called ‘square.c’:

void square(double* array, int n) {
int i;
for (i=0; i<n; ++i) {
array[i] = array[i] * array[i];
}
}


The first thing we’re going to need to do is compile this to a shared object file, so we punch the following into our terminal:

gcc -c square.c # compile the code to an object file
gcc square.o -shared -o square.so # compile to shared object file


And so now we have our square.so file. Super.

Here is the code that we can then use to interact with this shared object file from Python:

import numpy as np
import ctypes

n = 5
a = np.arange(n, dtype=np.double)

aptr = a.ctypes.data_as(ctypes.POINTER(ctypes.c_double))
square.square(aptr, n)

print a


OK, so there are a couple of lines with a lot of jargon but aside from that this looks pretty straightforward. The first noticeable thing is that we’re importing the ctypes library. So, make sure that you have that installed before you try to run this code. Right after importing ctypes then we create a handle into our shared object file, using the ctypes.cdll.LoadLibrary() command.

Then it’s some standard python code to create an array, and then there’s this funky command that lets us get a handle to the array that we just created as a c pointer that we can pass in to our square() function. You can see a list of all the different kinds of data types here.

Once we have that handle to our array, we simply pass it through to the square.square() function, and let it go and do whatever it’s going to do with it. Then we print out our array a, and to our great joy, all of the values have been squared.

So that’s that. Instead of returning anything like double* from the C library I would recommend just passing in the array as a parameter and modifying it instead. This is all nice and straightforward enough, and it seems like the returning business can lead to a lot of memory allocation and ownership headaches.

The code for this example can be found at my github, here: Square.

Tagged , , , ,

## Inverse kinematics of 3-link arm with constrained minimization in Python

Inverse kinematics is a common topic in robotics control; one that most anyone working with a robotic arm needs to address at some point. The use case I’ll be looking at here is the situation where we want to specify an $(x,y)$ coordinate for the end-effector, and solve for the appropriate joint angle configuration that best achieves this. It’s surprisingly straightforward to handle with the right approach, so it’s a nice little exercise in Python optimization (and we’ll even work in some graphical display!) And per usual, the code is all up on my github.

Most of the time the servos that anyone is working with are position controlled, meaning that you provide them with a set of desired angles for them to move to, rather than force controlled, where you provide them with a torque signal that directly moves them. In this case, inverse kinematics is all you need. You simply work out the set of joint angles you want the servos to move to and send those out directly. In biological systems this clearly isn’t the end of the work, because even if you know where you want to go you still have to provide a force (muscle activation) signal that will move your system there. But! as mentioned in the motor babbling paper review post I wrote a while back, having the target specified in the most basic space (i.e. joint angles instead of $(x,y)$ coordinates for a servo based arm, or muscle lengths instead of joint angles for a muscle based arm) can make calculating, or learning, the appropriate force signal much easier; so something that can do the inverse kinematics calculation is worth having.

Inverse kinematics

Now then, how do we go about finding the joint angle configuration that best achieves a desired $(x,y)$ coordinate? Constrained minimization! Our constraint is that the tip of the arm must be at the specified $(x,y)$ position, and we need to specify some cost function for the system to minimize to choose among all the possible configurations that accomplish this. The cost function that we’re going to use here is a very intuitive one, that I first encountered in the paper ‘Learning operational space control’ by Jan Peters. What we’re going to do is specify a default, or resting, joint state configuration, and minimize our distance from that. Which makes a lot of sense.

I’m going to post the code for calculating the inverse kinematics of a 3-link arm, and then we’ll work through it.

import math
import numpy as np
import scipy.optimize

def __init__(self, q=None, q0=None, L=None):
"""Set up the basic parameters of the arm.
All lists are in order [shoulder, elbow, wrist].

:param list q: the initial joint angles of the arm
:param list q0: the default (resting state) joint configuration
:param list L: the arm segment lengths
"""
# initial joint angles
if q is None: q = [.3, .3, 0]
self.q = q
# some default arm positions
if q0 is None: q0 = np.array([math.pi/4, math.pi/4, math.pi/4])
self.q0 = q0
# arm segment lengths
if L is None: L = np.array([1, 1, 1])
self.L = L

self.max_angles = [math.pi, math.pi, math.pi/4]
self.min_angles = [0, 0, -math.pi/4]

def get_xy(self, q=None):
if q is None: q = self.q

x = self.L[0]*np.cos(q[0]) + \
self.L[1]*np.cos(q[0]+q[1]) + \
self.L[2]*np.cos(np.sum(q))

y = self.L[0]*np.sin(q[0]) + \
self.L[1]*np.sin(q[0]+q[1]) + \
self.L[2]*np.sin(np.sum(q))

return [x, y]

def inv_kin(self, xy):

def distance_to_default(q, *args):
# weights found with trial and error, get some wrist bend, but not much
weight = [1, 1, 1.3]
return np.sqrt(np.sum([(qi - q0i)**2 * wi
for qi,q0i,wi in zip(q, self.q0, weight)]))

def x_constraint(q, xy):
x = ( self.L[0]*np.cos(q[0]) + self.L[1]*np.cos(q[0]+q[1]) +
self.L[2]*np.cos(np.sum(q)) ) - xy[0]
return x

def y_constraint(q, xy):
y = ( self.L[0]*np.sin(q[0]) + self.L[1]*np.sin(q[0]+q[1]) +
self.L[2]*np.sin(np.sum(q)) ) - xy[1]
return y

return scipy.optimize.fmin_slsqp( func=distance_to_default,
x0=self.q, eqcons=[x_constraint, y_constraint],
args=[xy], iprint=0) # iprint=0 suppresses output



I’ve taken out most of the comments for compactness, but there are plenty in the code on github, don’t you worry. Alright, let’s take a look!

First, we go through and set up the parameters of our arm. If nothing was passed in during construction, then we’re going to make an arm that is initially at it’s resting position, with shoulder and elbow joint angles = $\frac{\pi}{4}$, and wrist angle = 0. We also set up the arm segment lengths to all be 1, and the minimum and maximum joint angles.

Next, there’s the get_xy function, which just uses some simple trig to calculate the current $(x,y)$ position of the end-effector. This is going to be a handy thing to have when we want to see how well the joint configurations we calculate achieve our desired hand position.

And then after this, we have the actual inverse kinematics function. inv_kin takes in a desired $(x,y)$ position, and returns to us a list of the joint angles that will best achieve this. We define three methods inside inv_kin that define our constrained optimization problem, distance_to_default, which is the function we are minimizing, and x_constraint and y_constraint, which are the constraints that must be met for a solution to be valid. These are all pretty straight forward functions, the distance_to_default function calculates the Euclidean distance in joint space to the resting state, and the constraint functions calculate the difference between the actual and desired end-effector positions for a given set of joint angles.

There is one point of interest in the distance_to_default method, and that is the use of the weights therein. What is the point of applying a gain to these values, you ask? Why, simply that it lets us weight the relative importance of each of these joints remaining close to their default configuration! If you think about moving your arm on a horizontal plane, the wrist actually bends very little. To mimic this we can set the weight applied to the distance of the wrist from its resting state higher than those of the other joints, so that it becomes a higher priority to keep the wrist near its resting state. I suggest playing around with once the graphical display (below) is built up, it’s fun to see the effects different weightings can give on joint configurations.

With these three functions defined, we can now call up the scipy.optimize.fmin_slsqp function, which performs the sequential least squares method to arrive at a solution.

Really, that’s all there is to it!

To test this now, we have the following method:

def test():
############Test it!##################

# set of desired (x,y) hand positions
x = np.arange(-.75, .75, .05)
y = np.arange(0, .75, .05)

# threshold for printing out information, to find trouble spots
thresh = .025

count = 0
total_error = 0
# test it across the range of specified x and y values
for xi in range(len(x)):
for yi in range(len(y)):
# test the inv_kin function on a range of different targets
xy = [x[xi], y[yi]]
# run the inv_kin function, get the optimal joint angles
q = arm.inv_kin(xy=xy)
# find the (x,y) position of the hand given these angles
actual_xy = arm.get_xy(q)
# calculate the root squared error
error = np.sqrt((np.array(xy) - np.array(actual_xy))**2)
# total the error
total_error += error

if np.sum(error) > thresh:
print '-------------------------'
print 'Initial joint angles', arm.q
print 'Final joint angles: ', q
print 'Desired hand position: ', xy
print 'Actual hand position: ', actual_xy
print 'Error: ', error
print '-------------------------'

count += 1

print '\n---------Results---------'
print 'Total number of trials: ', count
print 'Total error: ', total_error
print '-------------------------'


Which is simply working through a set of target $(x,y)$ points, and calculating the total error across all of them. If any particular point gives an error above the threshold level, print out the information for that point so we can take a closer look if so desired. Fortunately, the threshold error isn’t even approached in this test, and for output we get:

---------Results---------
Total number of trials:  450
Total error:  [  3.33831421e-05   2.89667496e-05]
-------------------------


Visualization

So that’s all great and dandy, but it’s always nice to be able to really visualize these things, so I wrote another method that uses Pyglet to help visualize. This is a really easy to use graphics program that I tested out as a means here, and I was very happy with it. I ended up writing a method that pops up a window with a 3-link arm drawn on it, and the arm uses the above inverse kinematics function in the Arm class written above to calculate the appropriate joint angles for the arm to be at to follow the mouse. Once the joint angles are calculated, then their $(x,y)$ locations are also calculated, and the arm is drawn. Pleasingly, the inv_kin above is fast enough to work in real time, so it’s a kind of fun toy example. Again, I’ll show the code and then we’ll work through it below.

import numpy as np
import pyglet
import time

import Arm

def plot():
"""A function for plotting an arm, and having it calculate the
inverse kinematics such that given the mouse (x, y) position it
finds the appropriate joint angles to reach that point."""

# create an instance of the arm

# make our window for drawin'
window = pyglet.window.Window()

label = pyglet.text.Label('Mouse (x,y)', font_name='Times New Roman',
font_size=36, x=window.width//2, y=window.height//2,
anchor_x='center', anchor_y='center')

def get_joint_positions():
"""This method finds the (x,y) coordinates of each joint"""

x = np.array([ 0,
arm.L[0]*np.cos(arm.q[0]),
arm.L[0]*np.cos(arm.q[0]) + arm.L[1]*np.cos(arm.q[0]+arm.q[1]),
arm.L[0]*np.cos(arm.q[0]) + arm.L[1]*np.cos(arm.q[0]+arm.q[1]) +
arm.L[2]*np.cos(np.sum(arm.q)) ]) + window.width/2

y = np.array([ 0,
arm.L[0]*np.sin(arm.q[0]),
arm.L[0]*np.sin(arm.q[0]) + arm.L[1]*np.sin(arm.q[0]+arm.q[1]),
arm.L[0]*np.sin(arm.q[0]) + arm.L[1]*np.sin(arm.q[0]+arm.q[1]) +
arm.L[2]*np.sin(np.sum(arm.q)) ])

return np.array([x, y]).astype('int')

window.jps = get_joint_positions()

@window.event
def on_draw():
window.clear()
label.draw()
for i in range(3):
pyglet.graphics.draw(2, pyglet.gl.GL_LINES, ('v2i',
(window.jps[0][i], window.jps[1][i],
window.jps[0][i+1], window.jps[1][i+1])))

@window.event
def on_mouse_motion(x, y, dx, dy):
# call the inverse kinematics function of the arm
# to find the joint angles optimal for pointing at
# this position of the mouse
label.text = '(x,y) = (%.3f, %.3f)'%(x,y)
arm.q = arm.inv_kin([x - window.width/2, y]) # get new arm angles
window.jps = get_joint_positions() # get new joint (x,y) positions

pyglet.app.run()

plot()


There are really only a few things that we’re doing in this method: We create an arm with somewhat normal segment lengths and a window for plotting; we have a function for calculating the $(x,y)$ coordinates of each of the joints for plotting; we have a function that tracks the position of the mouse and updates our arm joint angles by calling the arm.inv_kin function we wrote above; and lastly we plot it! Easy. The functions aren’t quite in that order, and that’s because I chose alphabetical ordering over conceptual. Here’s a picture!

I think the only kind of weird thing going on in here is that my variables that I use in both this method and in the overwritten window methods are defined as belonging to window. Aside from that, it’s straightforward trig to calculate the joint angle Cartesian coordinates, and then there’s also a label that displays the mouse $(x,y)$ position for the hell of it. Honestly, the label was in a Pyglet tutorial example and I just left it in.

It’s not going to win any awards for graphics by any means, but it’s simple and it works! And, assuming you have Pyglet installed (which is very easy to do), you should straight up be able to run it and be amazed. Or a little impressed. Or not, whatever. But in any case now you have your very own inverse kinematics example on a 3-link arm! Moving to any other number of joints is exactly the same, you just have to add the corresponding number of default joint configurations and everything should work out as is.

From here, the plan is to look at bringing in an actual force driven arm model to the mix, and then building up a system that can track the mouse, but with a control signal that specifies forces to apply to the arm, rather than positions. Which I think will be pretty neat. And in case you missed it at the top, here’s a link to all the code on my github.

## Likelihood calculus paper series review part 3 – Distributed control of uncertain systems using superpositions of linear operators

The third (and final, at the moment) paper in the likelihood calculus series from Dr. Terrence Sanger is Distributed control of uncertain systems using superpositions of linear operators. Carrying the torch for the series right along, here Dr. Sanger continues investigating the development of an effective, general method of controlling systems operating under uncertainty. This is the paper that delivers on all the promises of building a controller out of a system described by the stochastic differential operators we’ve been learning about in the previous papers. In addition to describing the theory, there are examples of system simulation with code provided! Which is a wonderful, and sadly uncommon, thing in academic papers, so I’m excited. We’ll go through a comparison of Bayes’ rule and Markov processes (described by our stochastic differential equations), go quickly over the stochastic differential operator description, and then dive into the control of systems. The examples and code run-through I’m going to have to save for another post, though, just to keep the size of this post reasonable.

The form our system model equation will take is the very general

$dx = f(x)dt + \sum g_i(x)u_i dt + \sum h_i(x, u_i)dB_i$,

where $f(x)$ represents the environment dynamics, previously also referred to as the unforced or passive dynamics of the system, $g_i(x)$ describes how the control signal $u_i$ affects the system state $x$, $h_i(x, u_i)$ describes the state and control dependent additive noise, and $dB_i$ is a set of independent noise processes such as Brownian motion. Our control goal is to find a set of functions of time, $u_i(t)$, such that the dynamics of our system behave according to a desired set of dynamics that we have specified.

In prevalent methods in control theory, uncertainty is a difficult problem, and often can only be effectively handled with a number of simplifications, such as linear systems of Gaussian added noise. In biological systems that we want to model, however, uncertainty is ubiquitous. There is noise on outgoing and incoming signals, there are unobserved controllers simultaneously exerting influence over the body, complicated and often unmodeled highly non-linear dynamics of the system and its interactions with the environment, etc. In the brain especially, the effect of unobserved controllers is a particular problem. Multiple areas of the brain will simultaneously be sending out control signals to the body, and the results of these signals tends to be only partially known or known only after a delay to the other areas. So for modeling, we need an effective means of controlling distributed systems operating under uncertainty. And that’s exactly what this paper presents: ‘a mathematical framework that allows modeling of the effect of actions deep within a system on the stochastic behaviour of the global system.’ Importantly, the stochastic differential operators that Dr. Sanger uses to do this are linear operators, which opens up a whole world of linear methods to us.

Bayes’ rule and Markov processes

Bayesian estimation is often used in sensory processing to model the effects of state uncertainty, combining prior knowledge of state with a measurement update. Because we’re dealing with modeling various types of system uncertainty, it seems like a good idea to consider Bayes’ rule. Here, Dr. Sanger shows that Bayes’ rule is in fact insufficient in this application, and Markov processes must be used. There are a number of neat insights that come from this comparison, so it’s worth going through.

Let’s start by writing Bayes’ rule and Markov processes using matrix equations. Bayes’ rule is the familiar equation

$p(x|y) = \frac{p(y|x)}{p(y)}p(x)$,

where $p(x)$ represents our probability density or distribution, so $p(x) \geq 0$ and $\sum_i p(x = i) = 1$. This equation maps a prior density $p(x)$ to the posterior density $p(x|y)$. Essentially, this tells us the effect a given measurement of $y$ has on the probability of being in state $x$. To write this in matrix notation, we assume that $x$ takes on a finite number of states, so $p(x)$ is a vector, which gives

$p_x' = Ap_x$,

where $p_x$ and $p_x'$ are the prior and posterior distributions, respectively, and $A$ in a diagonal matrix with elements $A_{ii} = \frac{p(y|x)}{p(y)}$.

Now, the matrix equation for a discrete-time, finite-state Markov process is written

$p_x(t+1) = Mp_x(t)$.

So where in Bayes’ rule the matrix (our linear operator) $A$ maps the prior distribution into the posterior distribution, in Markov processes the linear operator $M$ maps the probability density of the state at time $t$ into the density at time $t+1$. The differences come in through the form of the linear operators. The major difference being that $A$ is a diagonal matrix, while there is no such restriction for $M$. The implication of this is that in Bayes’ rule the update of a given state $x$ depends only on the prior likelihood of being in that state, whereas in Markov processes the likelihood of a given state at time $t+1$ can depend on the probability of being in other states at time $t$. The off diagonal elements of $M$ allow us to represent the probability of state transition, which is critical for capturing the behavior of dynamical systems. This is the reason why Bayes’ method is insufficient for our current application.

Stochastic differential operators

This derivation of stochastic differential operators is by now relatively familiar grounds, so I’ll be quick here. Starting with a stochastic differential equation

$dx = f(x)dt + g(x)dB$,

where $dB$ is our noise source, the differential of unit variance Brownian motion. The noise term introduces randomness into the state variable $x$, so we describe $x$ with a probability density $p(x)$ that evolves through time. This change of the probability density through time is captured by the Fokker-Planck partial differential equation

$\frac{\partial}{\partial t}p(x,t) = - \frac{\partial}{\partial x}(f(x)p(x,t)) + \frac{1}{2} \frac{\partial^2}{\partial x^2}(g(x)g^T(x)p(x,t))$,

which can be rewritten as

$\dot{p} = \mathcal{L}p$,

where $\mathcal{L}$ is the linear operator

$\mathcal{L} = - \frac{\partial}{\partial x}f(x) + \frac{1}{2} \frac{\partial^2}{\partial x^2}g(x)g^T(x)$.

$\mathcal{L}$ is referred to as our stochastic differential operator. Because the Fokker-Planck equation is proven to preserve probability densities (non-negativity and sum to 1), applying $\mathcal{L}$ to update our density will maintain its validity.

What is so exciting about these operators is that even though they describe nonlinear systems, they themselves are linear operators. What this means is that if we have two independent components of a system that affect it’s dynamics, described by $\mathcal{L}_1$ and $\mathcal{L}_2$, we can determine their combined effects on the overall system dynamics through a simple summation, i.e. $\dot{p} = (\mathcal{L}_1 + \mathcal{L}_2)p$.

Controlling with stochastic differential operators

Last time, we saw that control can be introduced by attaching a weighting term to the superposition of controller dynamics, giving

$\dot{p} = \sum_i u_i \mathcal{L}_i p$,

where $\mathcal{L}_i$ is the stochastic differential operator of controller $i$, and $u_i$ is the input control signal to that controller. In the context of a neural system, this equation describes a set of subsystems whose weighted dynamics give rise to the overall behavior of the system. By introducing our control signals $u_i$, we’ve made the dynamics of the overall system flexible. As mentioned in the previous review post, our control goal is to drive the system to behave according to a desired set of dynamics. Formally, we want to specify $u_i$ such that the actual system dynamics, $\hat{\mathcal{L}}$, match some desired set of dynamics, $\mathcal{L}^*$. In equation form, we want $u_i$ such that

$\mathcal{L}^* \approx \hat{\mathcal{L}} = \sum_i u_i \mathcal{L}_i$.

It’s worth noting also here that the resulting $\hat{\mathcal{L}}$ still preserves the validity of densities that it is applied to.

How well can the system approximate a set of desired dynamics?

In this next section of the paper, Dr. Sanger talks about reworking the stochastic operators of a system into an orthogonal set, which can then be used to easily approximate a desired set of dynamics. It’s my guess that the motivation behind doing this is to see how close the given system is able to come to reproducing the desired dynamics. This is my guess because this exercise doesn’t really generate control information that can be used to directly control the system, unless we translate the weights calculated by doing this back into term of the actual set of actions that we have available. But it can help you to understand what your system is capable of.

To do this, we’ll use Gram-Schmidt orthogonalization, which I describe in a recent post. To actually follow this orthogonalization process we’ll need to define an inner product and normalization operator appropriate for our task. A suitable inner product will be one that lets us compare the similarity between two of our operators, $L_1$ and $L_2$, in terms of their effects on an initial state probability density, $p_0$. So define

$\langle L_1, L_2 \rangle_{p_0} = \langle L_1 p_0, L_2 p_0 \rangle = \langle \dot{p}_{L_1} , \dot{p}_{L_2} \rangle$

for the discrete-space case, and similarly

$\langle \mathcal{L}_1, \mathcal{L}_2 \rangle_{p_0} = \int (\mathcal{L}_1 p_0)(\mathcal{L}_2 p_0)dx = \int \dot{p}_{L_1} \dot{p}_{L_2} dx$

in the continuous-state space. So this inner product calculates the change in probability density resulting from applying these two operators to this initial condition, and finds the amount which they move the system in the same direction as the measure of similarity.
The induced norm that we’ll use is the 2-norm,

$||L||_{p_0} = \frac{||L p_0||_2}{||p_0||_2}.$

With the above inner product and normalization operators, we can now take our initial state, $p_0$, and create a new orthogonal set of stochastic differential operators that span the same space as original set through the Gram-Schmidt orthogonalization method. Let’s denote the orthonormal basis set vectors as $\Lambda_i$. Now, to approximate a desired operator $L^*$, generate a set of weights, $\alpha$, over our orthonormal basis set using a standard inner product: $\alpha_i = \langle L^*, \Lambda_i \rangle$. Once we have the $\alpha_i$, the desired operator can be recreated (as best as possible given this basis set),

$L^* \approx \hat{L} = \sum \alpha_i \Lambda_i.$

This could then be used as a comparison measure as the best approximation to a desired set of dynamics that a given system can achieve with its set of operators.

Calculating a control signal using feedback

Up to now, there’s been a lot of dancing around the control signal, including it in equations and discussing the types of control a neural system could plausibly implement. Here, finally, we actually get into how to go about generating this control signal. Let’s start with a basic case where we have the system

$\dot{p} = (\mathcal{L}_1 + u\mathcal{L}_2)p,$

where $\mathcal{L}_1$ describes the unforced/passive dynamics of the system, $\mathcal{L}_2$ describes the control-dependent dynamics, and $u$ is our control signal.

Define a cost function $V(x)$ that we wish to minimize. The expected value of this cost function at a given point in time is

$E[V] = \int V(x)p(x)dx,$

which can be read as the cost of each state weighted by the current likelihood of being in that state.
To reduce the cost over time, we want the derivative of our expected value with respect to time to decrease. Written in an equation, we want

$\frac{d}{dt}E[V] = \int V(x)\dot{p}(x)dx < 0.$

Note that we can sub in for $\dot{p}$ to give

$\frac{d}{dt}E[V] = \int V(x)[\mathcal{L}_1p(x) + u\mathcal{L}_2p(x)]dx.$

Since our control is effected through $u$, at a given point in time where we have a fixed and known $p(x,t)$, we can calculate the effect of our control signal on the change in expected value over time, $\frac{d}{dt}E[V]$, by taking the partial differential with respect to $u$. This gives

$\frac{\partial}{\partial u}\left[\frac{d}{dt}E[V]\right] = \int V(x)\mathcal{L}_2p(x)dx,$

which is intuitively read: The effect that the control signal has on the instantaneous change in expected value over time is equal to the change in probability of each state $x$ weighted by the cost of that state. To reduce $\frac{d}{dt}E[V]$, all we need to know now is the sign of the right-hand side of this equation, which tells us if we should increase or decrease $u$. Neat!

Although we only need to know the sign, it’s nice to include slope information that gives some idea of how far away the minimum might be. At this point, we can simply calculate our control signal in a gradient descent fashion, by setting $u = - k \int V(x)\mathcal{L}_2p(x)dx$. The standard gradient descent interpretation of this is that we’re calculating the effect that $u$ has on our function $\frac{d}{dt}E[V]$, and assuming that for some small range, $k$, our function is approximately linear. So we can follow the negative of the function’s slope at that point to find a new point on that function that evaluates to a smaller value.

This similarly extends to multiple controllers, where if there is a system of a large number of controllers, described by

$\dot{p} = \sum_i u_i \mathcal{L}_i p,$

then we can set

$u_i = - k_i \int V(x)\mathcal{L}_ip(x)dx.$

Dr. Sanger notes that, as mentioned before, neural systems will often not permit negative control signals, so where $u_i < 0$ we set $u_i = 0$. The value of $u_i$ for a given controller is proportional to the ability of that controller to be reduce the expected cost at that point in time. If all of the $u_i = 0$, then it is not possible for the available controllers to reduce the expected cost of the system.

Comparing classical control and stochastic operator control

Now that we finally have a means of generating a control signal with our stochastic differential operators, let’s compare the structure of our stochastic operator control with classical control. Here’s a picture:

The most obvious differences are that a cost function $V(x)$ has replaced the desired trajectory $\dot{x}$ and the state $x$ has been replaced by a probability density over the possible states, $p(x)$. Additionally, the feedback in classical control is used to find the difference between the desired and actual state, which is then multiplied by a gain, to generate a corrective signal, i.e. $u = k * (x^* - x)$, whereas in stochastic operator control signal is calculated as specified above, by following the gradient of the expected value of the cost function, i.e. $u = -k \int V(x)p(x)dx$.

Right away there is a crazy difference between our two control systems already. In classical control case, we’re following a desired trajectory, several things are implied. First, we’ve somehow come up with a desired trajectory. Second, we’re necessarily assuming that regardless of what is actually going on in the system, this is the trajectory that we want to follow. This means that the system is not robust to changes in the dynamics, such as outside forces or perturbations applied during movement. In the stochastic operator control case, we’re not following a desired path, instead the system is looking to minimize the cost function at every point in time. This means that it doesn’t matter where we start from or where we’re going, the stochastic operator controller looks at the effect that each controller will have on the expected value of the cost function and generates a control signal accordingly. If the system is thrown off course to the target, it will recover by itself, making it far more robust than classical control theory. Additionally, we can easily change the cost function input to the system and see a change in the behaviour of the system, whereas in classical control a change in the cost function requires that we regenerate our desired trajectory $\dot{x}$ before our controller will act appropriately.

While these are impressive points, it should also be pointed out that stochastic operator controllers are not the first to attack these issues. The robustness and behaviour business is similarly handled very well, for specific system (either linear or affine, meaning linear in terms of the dynamics of the control signal applied) and cost function forms (usually quadratic), by optimal feedback controllers. Optimal feedback controllers regenerate the desired trajectory online, based on system feedback. This leads to a far more robust control system that classical control provides. However, as mentioned, this is only for specific system and cost function forms. In the stochastic operator control any type of cost function be applied, and the controller dynamics described by $L_i$ can be linear or nonlinear. This is a very important difference, making stochastic operator control far more powerful.

Additionally, stochastic operator controllers operate under uncertainty, by employing a probability density to generate a control signal. All in all, stochastic operator controllers provide an impressive and novel amount of flexibility in control.

Conclusions

Here, Dr. Sanger has taken stochastic differential operators, looked at their relation to Bayes’ rule, and developed a method for controlling uncertain systems robustly. This is done through the observation that these operators have linear properties that lets the effects of distributed controllers on a stochastic system be described through a simple superposition of terms. By introducing a cost function, the effect of each controller on the expected cost of the system at each point in time can be calculated, which can then be used to generate a control signal that robustly minimizes the cost of the system through time.

Stochastic differential operators can often be inferred from the problem description; their generation is something that I’ll examine more closely in the next post going through examples. Using these operators time-varying expected costs associated with state-dependent cost functions can be calculated. Stochastic operator controllers introduce a significant amount more freedom in choice of cost function than has previously been available in control. Dr. Sanger notes that an important area for future research will be in the development of methods for optimal control of systems described by stochastic differential operators.

The downside of the stochastic operator controllers is that they are very computationally intensive, due to the fact that they must propagate a joint-density function forward in time at each timestep, rather than a single system state. One of the areas Dr. Sanger notes of particular importance for future work is the development of parallel algorithms, both for increasing simulation speed and examining possible neural implementations of such an algorithms.

And finally, stochastic differential operators exact a paradigm shift from classical control on the way control is considered. Instead of driving the system to a certain target state, the goal is to have the system behave according to a desired set of dynamics. The effect of a controller is then the difference between the behavior of the system with and without the activity of the controller.

This paper was particularly exciting because it discussed the calculation of the control signals for systems which we’ve described through the stochastic differential operators that have been developed through the last several papers. I admit confusion regarding the aside about developing an orthogonal equivalent set of operators, it seemed a bit of a red herring in the middle of the paper. I left out the example and code discussion from this post because it’s already very long, but I’m looking forward to working through them. Also worth pointing out is that I’ve been playing fast and loose moving back and forth between continuous and discrete, just in the interest in simplifying for understanding, but Dr. Sanger explicitly handles each case.

I’m excited to explore the potential applications and implementations of this technique in neural systems, especially in models of areas of the brain that perform a ‘look-ahead’ type function. The example that comes to mind is that of the rat reaching an intersection in a T-maze, and the neural activity recorded from place cells in the hippocampus shows the rat simulating the result of going left of going right. This seems a particularly apt application of these stochastic differential operators, as a sequence of actions and the resulting state can then be simulated and evaluated, providing that you have an accurate representation of the system dynamics in your stochastic operators.

To that end, I’m also very interested by possible means of learning stochastic differential operators for an action set. Internal models are an integral parts of motor control system models, and this seems like a potentially plausible analogue. Additionally, for modeling biological systems, the complexity of dynamics is something that is often infeasible to determine analytically. All in all, I think this is a really exciting road for exploring the neural control of movement, and I’m looking forward to seeing where it leads.

Sanger, T. (2011). Distributed Control of Uncertain Systems Using Superpositions of Linear Operators Neural Computation, 23 (8), 1911-1934 DOI: 10.1162/NECO_a_00151

## Gram-Schmidt orthogonalization

The context here is that we have some desired vector $v^*$ that we want to build out of a set of basis vectors $v_i$ through weighted summation. The case where this is easiest is when all of our vectors $v_i$ are orthogonal with respect to each other. Recalling that a dot product of two vectors gives us a measure of their similarity, two vectors are orthogonal if their dot product is 0. A basic example of this is the set $[1,0],[0,1]$, or the rotation of these vectors 45 degrees, $[.7071, .7071],[-.7071, .7071]$.

If we have an orthogonal basis set of vectors, then to generate the weights for each of the basis vectors we simply take the dot product between each $v_i$ and our desired vector $v^*$. For example, with our basis sets from above, the weights to generate the vector $[.45 -.8]$ can be found as

$w_1 = \langle [.45, -.8] , [1, 0] \rangle = .45 \\ w_2 = [.45, -.8] \cdot [0, 1] = -.8,$

where $\langle \rangle$ denotes dot (or inner) product, and leads to

$w_1 = \langle [.45, -.8] , [.7071, .7071] \rangle = -0.2475 \\ w_2 = \langle [.45, -.8] , [-.7071, .7071] \rangle = -0.8839.$

And now we have weights $w_i$ such that for each basis set $\sum_i w_i v_i = v^*$. Written generally, to find the weights we have $w_i = \frac{v_i \cdot v^*}{||v_i||}$. The denominator here is the norm of $v_i$, introduced for generality. In the example set our basis sets were composed of unit vectors (vectors with magnitude = 1), but in general normalization is required.

Now, what if we don’t have an orthogonal basis set? Trouble, that’s what. With a non-orthogonal basis set, such as $[1, .4], [-.1, 1]$, when we try our dot product business to find our coefficients looks what happens

$w_1 = \frac{\langle [.45, -.8] , [1, .4] \rangle}{||[1,.4]||} = .3682 \\ w_2 = \frac{\langle [.45, -.8] , [-.1, 1] \rangle}{||[-.1, 1]||} = -.8408,$

and

$.1207 \cdot [1,.4] + -.8408 \cdot [-.1, 1] = [0.2048, -0.7925]$,

which is not a good reconstruction of our desired vector, $[.45, -.8]$. And the more the cross contribution to the same dimensions between different basis vectors, the worse this becomes. Of course, we could use a least squares method to find our basis set coefficients, but that involves matrix multiplications and inverses, and generally becomes more complex than we want.

So, let’s say we have a basis set of two different, but non-orthogonal vectors, $v_1$ and $v_2$. We instead want two vectors $u_1$ and $u_2$ which describe the same space, but are orthogonal. By describing the same space, I mean that their span is the same. And by span I mean the set of values that can be generated through weighted summation of the two vectors. So we set $u_1 = v_1$, and the task is now to find the appropriate $u_2$. As a conceptual description, we want $u_2$ to be equal to $v_2$, but only covering the area of space that $u_1$ isn’t covering already. To do this, we can calculate at the overlap between $u_1$ and $v_2$, then subtract out that area from $v_2$. The result should then give us the same area of state space covered by $v_1$ and $v_2$, but in a set of orthogonal vectors $u_1$ and $u_2$.

Mathematically, we calculate the overlap between $u_1$ and $v_2$ with a dot product, $\langle u_1, v_2 \rangle$, normalized by the magnitude of $u_1$, $||u_1||$, and then subtract from $v_2$. All together we have

$u_2 = v_2 - \frac{\langle u_1, v_2 \rangle}{||u_1||}$.

Using our non-orthogonal example above,

$u_1 = [1, .4]$

$u_2 = [-.1, 1] - \frac{\langle [-.1, 1] , [1, .4] \rangle}{||[-.1, 1]||} = [-0.3785, 0.7215].$

Due to roundoff during the calculation of $u_2$, the dot product between $u_1$ and $u_2$ isn’t exactly zero, but it’s close enough for our purposes.

OK, great. But what about if we have 3 vectors in our basis set and want 3 orthogonal vectors (assuming we’ve moved to a 3-dimensional space) that span the same space? In this case, how do we calculate $u_3$? Well, carrying on with our intuitive description, you might assume that the calculation would be the same as for $u_2$, but now you must subtract out from $v_3$ that is covered by $u_1$ and $u_2$. And you would be correct:

$u_3 = v_3 - \frac{\langle u_1, v_3 \rangle}{||u_1||} - \frac{\langle u_2, v_3 \rangle}{||u_2||}$.

In general, we have

$u_i = v_i - \sum_j \frac{\langle u_j, v_i \rangle}{||u_j||}$.

And now we know how to take a given set of basis vectors and generate a set of orthogonal vectors that span the same space, which we can then use to easily calculate the weights over our new basis set of $u_i$ vectors to generate the desired vector, $v^*$.

## Likelihood calculus paper series review part 2 – Neuro-mechanical control using differential stochastic operators

The second paper put out by Dr. Terence Sanger in the likelihood calculus paper series is Neuro-mechanical control using differential stochastic operators. Building on the probabalistic representation of systems through differential stochastic operators presented in the last paper (Controlling variability, which I review here) Dr. Sanger starts exploring how one could effect control over a system whose dynamics are described in terms of these operators. Here, he specifically looks at driving a population of neurons described by differential stochastic operators to generate the desired system dynamics. Neural control of a system requires that several phenomena outside the realm of classical control theory be addressed, including the effects of variability in control due to stochastic firing, large partially unlabeled cooperative controllers, bandlimited control due to finite neural resources, and variation in the number of available neurons.

The function of a neuron in a control system can be completely described in terms of 1) the probability of it spiking due to the current system state, $p(s=1|x)$, and 2) the effect of its response on the change in the state. Due to the inherent uncertainty in these systems, each individual neuron’s effect on the change in state is captured by a distribution, $p(\dot{x}|s)$. And because the effect of each neuron is only a small part of a large dynamical system that includes the dynamics of the system being controlled and the effects of all the other neurons, these distributions tend to be very broad.

Rephrasing the above description, neurons are mapping a given state $x$ to a change in state $\dot{x}$. Instead of using two conditional densities to describe this mapping, $p(s|x)$ and $p(\dot{x}|s)$, we can rewrite this more compactly as

$p(\dot{x}|x) = p(\dot{x}|s=1)\;p(s=1|x) + p(\dot{x}|s=0)\;p(s=0|x)$,

which can be read as the probability of a change in state $\dot{x}$ given the current state $x$ is equal to the probability of that change in state occurring if the neuron spikes, $p(\dot{x}|s=1)$, multiplied by the probability of that neuron spiking given the current state, $p(s=1|x)$, plus the probability of that state occurring if the neuron doesn’t spike, $p(\dot{x}|s=0)$, multiplied by the probability of that neuron not spiking, $p(s=0|x)$.

Differential stochastic operators

We want to capture the mapping $p(\dot{x}|x)$ in such a way that if we have a description of a bunch of controllers (such as neurons) and the change in system state they effect individually, we can combine them in a straightforward way to get the overall change in state resulting from all the controllers operating in parallel. To do this we can use the linear operators developed in the previous paper, which allows us combine the effects of multiple components through simple summation to determine the overall change in system state. We’ll go over it again here, as I find reading through several different versions of an idea very helpful for solidifying understanding.

Let $\mathcal{L}$ denote a class of linear operators that act on time-varying probability densities, $p(x,t)$, such that $\frac{\partial}{\partial t}p(x,t) = \mathcal{L}p(x,t)$. Because these operators need to preserve the properties of valid probability density (specifically that $\int p(x)dx = 1$ and $p(x) \geq 0$), for a given operator $L$ where $\dot{p}(x) = \int L(x,y) p(y) dy$ we require that:

• 1) $\int L(x,y)dx = 0$ for all $y$,
• 2) $L(x,y) \geq 0$ whenever $x \neq y$,

which respectively enforce the aforementioned constraints.

So, first thing’s first. Let’s read out $\dot{p}(x) = \int L(x,y) p(y) dy$. This says that our change in the probability density, $\dot{p}(x)$, is found by taking our function that tells us what the change in density is for system state $x$ given our current state $y$, which is $L(x,y)$, and weighting that by the probability of currently being in state $y$, which is $p(y)$, then summing that all up, which is the integral.

Now the constraints. The first constraint reads out as the integral of the changes of the probability density at each point $x$ for a given state $y$ must be equal to 0. This means that the area of the probability density over the states after updating them is the same. So, assuming we start out with a valid density whose sum equals 1, we always have a density whose sum equals 1.

The second constraint reads out as the change in probability density for state $x$ given a current state $y$ must be greater than zero whenever $x \neq y$. This means the only time that the change in the probability density can be negative is if there is a probability of being in that state; it enforces that all $p(x) \geq 0$, because $\dot{p}$ can’t be negative when $p(x)$ is zero.

Dr. Sanger defines the linear operators that satisfy these two conditions to be “differential stochastic operators”. The discrete time versions are matrices, dubbed “difference stochastic operators”.

Superposition of differential stochastic operators

Differential stochastic operators can be derived in different ways, here we’ll go through the derivation from the ‘master’ equation defining $p(\dot{x}|x)$, and from a stochastic differential equation. They each have their insights, so it’s helpful to work through both.

Derivation from master equation

The equation for $p(\dot{x}|x)$ written out above,

$p(\dot{x}|x) = p(\dot{x}|s=1)\;p(s=1|x) + p(\dot{x}|s=0)\;p(s=0|x)$,

determines the probability flow on the state. By employing the Kramers-Moyal expansion we can capture this probability flow through a partial differential equation describing change in probability density. In other words, instead of capturing the time evolution of the system state with a probability density over possible changes in state, $\dot{x}$, we capture it through the changing in probability of each state, $\dot{p}(x)$. The Kramers-Moyal expansion looks like:

$\frac{\partial}{\partial t}p(x,t) = -\frac{\partial}{\partial x}(D_1(x)p(x,t)) + \frac{1}{2}\frac{\partial^2}{\partial x^2}(D_1(x)p(x,t)) + ...$,

where $D_k(x) = E[(\dot{x})^k] = \int \dot{x}^k p(\dot{x}|x) d\dot{x}$. Truncating this expansion at the first two terms we get the Fokker-Planck equation, where the first term describes the drift of the density, and the second term the diffusion. These two terms are sufficient for describing Gaussian conditional densities, which capture many physical phenomena. In the case where $p(\dot{x}|x)$ does not have a Gaussian distribution, higher-order terms from the Kramers-Moyal expansion will need to be included.

Now, imagine we have a simple system of two neurons, where the change in state is defined as $\dot{x} = \dot{x}_1 + \dot{x}_2$. If these neurons have conditionally independent variability, i.e. $p(\dot{x}_1 \dot{x}_2 | x) = p(\dot{x}_1|x)p(\dot{x}_2|x)$, then we can sum the Kramers-Moyal expansion of each of these terms to describe the evolution of the overall system state:

$\frac{\partial}{\partial t}p(x,t) = - \sum_i \frac{\partial}{\partial x}(D_{1i}(x)p(x,t)) + \frac{1}{2} \sum_i \frac{\partial^2}{\partial x^2}(D_{2i}(x)p(x,t)) + ...$,

as long as the neurons have conditionally independent variability. This means that they can’t be connected (directly or indirectly) such that a spike in one neuron causes a spike in the other. While this might seem a poor assumption for modeling networks of spiking neurons, in large populations with many input, the effects of any single input neuron tends to be small enough that the assumption holds approximately.

We can rewrite the previous equation now, taking advantage of linearity of $p(x,t)$ and the Kramers-Moyal coefficients, to get

$\dot{p}(x,t) = \mathcal{L}p = \sum_i \mathcal{L}_i p$,

which means that by describing neurons with the differential stochastic operators, $\mathcal{L}_i$, we can determine the cumulative effect on the dynamics of the system through simple summation. Which we all remember from the last paper, but hey, good to review.

Now, in the event that we want to model the effect of a group of strongly interconnected neurons, we can instead consider the effect of the group as a function of the $2^n$ possible firing patterns (spike or no spike from each of the neurons). So where before $p(\dot{x}| x)$ was written in terms of the two cases $s = 0$ and $s = 1$, it would now be written:

$p(\dot{x}|x) = \sum_{i=1}^{2^n} p(\dot{x}|x,i)p(i)$,

where each $i$ is a different spike pattern. This group on neurons and their effect on the system dynamics is then considered as a single unit, and the differential stochastic operator describing them can then be combined with the operator from another group of neurons, provided that the two groups exhibit conditionally independent variability.

If there are no independent groups in the network then it’s fully connected and this is not for you go home.

Derivation from a stochastic differential equation

For this derivation, imagine a set of controllers operating in parallel. Each controller has a stochastic differential equation that defines how a control signal input affects the system dynamics,

$dx = f_i(x)dt + g_i(x)dB_i$,

where $f_i$ and $g_i$ are arbitrary equations and $dB_i$ are random functions of time, or noise terms. Let $f_i(x)dt$ for $i > 0$ be the controller equations, and $f_0(x)dt$ be the unforced, or passive, dynamics of the system, which define how the system behaves without controller input. We can write the equation for the whole system as

$dx = f_0(x)dt + \sum_{i>0}u_i(f_i(x)dt + g_i(x)dB_i)$,

where $u_i$ are constant or slowly varying control inputs. The reason we would choose to denote the unforced (passive) dynamics of the system as $f_0$ is because we can now define $u_0 = 1$, and rewrite the above equation as

$dx = \sum_{i}u_i(f_i(x)dt + g_i(x)dB_i)$.

The corresponding Fokker-Planck equation for the evolution of the state probability density is

$\frac{\partial}{\partial t}p(x,t) = - \sum_i u_i \frac{\partial}{\partial x}(f_i(x)p(x,t)) + \frac{1}{2} \sum_i u_i \frac{\partial^2}{\partial x^2}(g_i(x)p(x,t))$.

Look familiar? We can rewrite this as a superposition of linear operators

$\dot{p}(x,t) = \mathcal{L}p = \sum_i u_i \mathcal{L}_i p$,

and there you go.

Population model

So, now we can apply this superposition of differential stochastic equations to describe the effect of a population of neurons on a given system. Dr. Sanger lists several ways that this model can go about being controlled; 1) modifying the tuning curves of the neurons, which specifies how they respond to stimulus; 2) modify the output functions that determines the effect that a neuron has on the dynamics of the system; and 3) modifying the firing threshold of the neurons.

I found the difference between 1 and 3 can be a little confusing, so let’s look at an example tuning curve in 2D space to try to make this clear. Imagine a neuron sensitive to -dimensional input signals, and that it’s tuning curve looks like this:

If we’re changing the tuning curve, then how this neuron responds to its input stimulus will change. For example, in this diagram we show changing the tuning curve of a neuron:

Here, the neuron no longer responds to the same type of input that it responded to previously. We have made a qualitative change to the type of stimulus this neuron responds to.

If we change the firing threshold, however, then what we’re changing is when the neuron starts responding to stimulus that it is sensitive to.

Here we show the neuron becoming more and more sensitive to a its stimulus, respond stronger sooner and sooner. So the type of signal that the neuron responds to isn’t changing, but rather when the neuron starts responding.

Alright, now that we’ve got that sorted out, let’s move on.
Tuning curves (1) and output functions (2) are both modifiable through learning, by changing the incoming and outgoing connection weights, respectively, but for controlling systems on the fly this is going to be too slow, i.e. slower than the speed at which the system moves. So what’s left is (3), modifying the firing threshold of the neurons. So the model then looks like:

where $p(x)$ is projected in to a population of neurons, each with a stochastic differential operator that sum together to generate $\dot{p}(x)$. In this diagram, $\lambda_i$ is the firing threshold of neuron $i$, and $\lambda_i(x)$ denotes the modulation of the firing rate of neuron $i$ as a function of the current system state. When the modulation is dependent on the system state we have a feedback, or closed-loop, control system. Dr. Sanger notes that in the case that $\lambda_i$ is heavily dependent on $x$, modulating the firing threshold is indistinguishable from modifying the tuning curve, meaning that we can get some pretty powerful control out of this.

Conclusions

‘The theory of differential stochastic operators links the dynamics of individual neurons to the dynamics of a full neuro-mechanical system. The control system is a set of reflex elements whose gain is modulated in order to produce a desired dynamics of the overall system.’

This paper presents a very different formulation of control than classical control theory. Here, the goal is to modulate the dynamics of the system to closely match a desired set of dynamics that achieve some task, rather than to minimize the deviation from some prespecified trajectory through state space. Dr. Sanger notes that this description of control matches well to behavioral descriptions of neural control system, where there are numerous subsystems and circuits that have reflexive responses to external stimuli which must be modulated to achieve desired behavior. The goal of control is to fully define the dynamics of the system’s reaction to the environment.

What comes to mind first for me, in terms of using the modulation of reflex elements to effect a desired set of dynamics, is modeling the spinocerebellum. With a ton of projections directly to the spinal cord, and strong implications in locomotor and balance system, it seems like it would be a very good candidate for being modeled with this type of control. The idea being that the cerebellum is projecting modulatory values to the different spinal circuits (reflex networks and central pattern generators, for example) that specify how to respond to changes in the environment to maintain our balance or the rhythm of our walk. How we go about specifying exactly what those modulatory terms need to be is something that Dr. Sanger tackles in the last paper of this series, which I’ll be reviewing in the next couple of months. I’m looking forward to it.

On another note, in my lab we all work with the Neural Engineering Framework, in which populations of neurons are taken to represent vectors, and to perform transformations on these vectors to relay information an perform various functions. To this end, something that interests me about likelihood calculus is its application to populations of neurons representing vectors. Instead of finding $p(\dot{x}|x)$ by summing the effects of all of the neurons in a population, or defining it in terms of the population spiking patterns, we’re looking at it in terms of the different vectors this population can represent, and the effect of that vector on the system. So we can still have spiking neuron based models, but we can do all the likelihood calculus business one level removed, simplifying the calculation and reducing the number of differential stochastic operators needed.

There are a ton of things going on in this paper, and lots to think about. At several points I deviated from the notation used in this paper because I found it unclear, but aside from that I’ve really enjoyed reading through it and writing it up. Very interesting work.

Sanger TD (2010). Neuro-mechanical control using differential stochastic operators. Conference proceedings : … Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference, 2010, 4494-7 PMID: 21095779

## Online Goal Babbling – motor learning paper review

Recently I was linked to an article about learning how to control a highly complex arm from scratch: How infants tell us how to control the Bionic Handling Assistant. The work seemed very interesting so I went and pulled one of the papers linked in the article, Online Goal Babbling for rapid bootstrapping of inverse models in high dimensions.

Diving into that title, online means that we’re using information from every movement as it’s gathered to improve our control, as opposed to ‘batch’ where learning only occurs every so-many trials. Bootstrapping is the process of bringing a system up to a functionally useful level. High dimensions then refers to the complexity of the system being controlled, where every component that requires a control signal is another dimension. Humans, for example, require extremely high dimensional control signals. Inverse models refer to a type of internal model, which ‘describe relations between motor commands and their consequences’. Forward models predict the results of a movement, and inverse models allow suggest a motor command that can be used to achieve a desired consequence, such as ‘the ball falls on the floor’ or ‘my hand touches the red balloon’.

Alright, so that’s the title, let’s dive into the paper proper.

Online goal babbling

The idea behind this research is to let the system learn how to control itself by exploring the environment. The reason why you would want to do this is so that you don’t have to analytically specify how the system should move. Analytic solutions require accurate models of the system dynamics, and calculating these quickly become horrendously complex. To the point that the equations of motion for a relatively simple 3-link arm moving are pages upon pages upon pages. On top of this, because your solution is only is as good as your model and the model you just took all that time to calculate isn’t adaptive, if your system dynamics change at all through wear and tear or an environment change, you’re in big trouble. You can use feedback control to help compensate for the errors introduced, but you can only respond as fast as you can receive and process sensory signals, which is often too long in practical applications.

So this moves us to adaptive feedforward control, based on learning an inverse model of the system dynamics. Importantly, what this paper describes is a kinematic inverse controller, not a kinetic inverse controller; meaning that given a desired target position for the end effector (hand) of an arm it provides a sequence of joint angle sets that will lead to the end effector reaching the target, as opposed to providing a sequence of joint angle torques to drive the system to the target. At some point, a control system will have to generate force commands, rather than just specify what position it wants the system to be in. But, knowing what joint angles and trajectory the joint angles should follow is an important problem, because systems that are of interest to control tend to exhibit a fair amount of configuration redundancy between ‘task-space’ and the actual system state-space. Task-space being something like the 3-dimensional $(x,y,z)$ position of the end-effector, which we are interested in controlling, and the actual system state-space being something like the joint-angles or muscle-lengths. Configuration redundancy is the problem of more than one possible set of joint-angles putting the end-effector in the desired location. Often the number of potential solutions is incredibly large, think about the number of possible ways you can reach to an object. So how do you learn an appropriate trajectory for your joint angles during a reaching movement? That is the problem being addressed here.

To learn an inverse model from scratch, the system needs to explore. How should it explore? Moving randomly eventually leads to an exhaustive search, but this is a poor choice in complex systems because it takes a large amount of time, increasing exponentially with the degrees of freedom (i.e. number of joint angles) of the system. So let’s look to babies, they’re responsible for learning how to control a very complex system, how the heck do they learn to move?

‘Motor babbling’ is a term that was coined to describe the seemingly random way babies moved about. It was suggested that they just flail about without purpose until they gain some understanding of how their bodies work, at which point they can start moving with purpose. But! Baby movement was shown way back in the 80’s to in fact not be just random, but instead to be highly goal directed. And when they find a solution, they stick with it, they don’t worry about it being the best one as long as it gets the job done. Only later are movements tweaked to be more efficient. As mentioned above, in systems as complicated as the human body the task-space (i.e. position of the hand) is much smaller than the motor space (i.e. length of all muscles), and there are a bunch of different solutions to a given task. With all these different potential solutions to a given problem, an exhaustive search isn’t even close to being necessary. Also, if babies aren’t just randomly exploring space to figure things out, they don’t have to flip a switch somewhere in their brain that says “ok stop messing around and let’s move with purpose”.

This paper provides a technique for stable online inverse model learning that can be used from initial bootstrapping, and shows how online learning can even be highly beneficial for bootstrapping. So let’s dive into the Online Goal Babbling learning method.

The inverse model function

Let’s denote the inverse model function we’re learning as $g()$, joint angles as $q$, and end effector positions as $x$. Then to denote giving a desired end effector position and getting out a set of joint angles we write $g(x^*) = q$, where $x^*$ represents a target in end effector space.

We’re going to initialize the inverse model function by having every end effector position return some default resting state of the arm, or home position, that we’ve defined, $q_{home}$. Additionally, we’re going to throw on some exploratory noise to the answer provided by the inverse model, so that the joint angles to move to at time $t$ are defined as $q_t = g(x^*_t, \theta_t) + E_t(x_t^*)$, where $E_t(x^*_t)$ is the exploratory noise. When the system then moves to $q_t$ the end effector position, $x_t$, is observed and the parameters of the inverse model, $\theta$, are updated immediately.

To generate a smooth path from the starting position to the target, the system divides the distance up into a number of sub-targets (25 in the paper) for the system to hit along the way. This is an important point, as it’s how the inverse model function is used to create a path that represents the system moving; a series of targets are provided and the inverse model is queried “what should the joint configuration be for this position?”

As mentioned before, it is possible to have he end effector in the same position, but the joints in a different configuration. Learning across these examples is dangerous and can lead to instability in the system as neighbouring targets could be learned with very dissimilar joint configurations, preventing smooth movement through joint-space. To prevent this, observed information is weighted by a term $w_t$ as it is taken in, based on deviation from the resting position of the arm ($q_{home}$) and efficiency of end effector movement. What this leads to is a consistent solution to be converged upon throughout movement space, causing the inverse model to generate smooth, comfortable (i.e. not near joint limits, but near the resting state of the arm) movements.

To recenter movement exploration every so often, and prevent the system from exploring heavily in some obscure joint-space, every time a new target to move to is selected there is some probability (.1 in the paper) that the system will return to $q_{home}$. To return home the system traces out a straight trajectory in joint-space, not worrying about how it is moving through end effector space. This return probability reinforces learning how to move well near $q_{home}$, and acts as a ‘developmentally plausible stabilizer that helps to stay on known areas of the sensorimotor space.’

Also, we mentioned adding exploratory noise. How is that noise generated? The noise is calculated through a small, randomly chosen linear function that varies slowly over time: $E_t(x^*_t) = A_t \cdot x^* + c_t$, where the entries to the matrix $A_0$ and vector $b_0$ are chosen independently from a normal distribution with zero mean and variance $\sigma^2$. To move, a set of small values is chosen from a normal distribution with variance significantly smaller than $\sigma^2$, and added to elements of $A_t$ and $c_t$. A normalization term is also used to keep the overall deviation stable around the standard deviation $\sigma$. And that’s how we get our slowly changing linear random noise function.

Online learning

To learn the inverse model function, we’re going to use the technique of creating a bunch of linear models that are accurate in a small area of state space, and weighting their contribution to the output based on how close we are to their ‘region of validity’. The specific method used in the paper is a local-linear map, from (H.Ritter, Learning with the self-organizing map 1991). We define our linear models as $g^{(k)}(x) = M^{(k)} \cdot x + b^{(k)}$, intuited as following the standard $mx + b$ definition of a line, but moving into multiple dimensions. $M^{(k)}$ is our linear transformation of the input for model $k$, and $b^{(k)}$ is the offset.

If an input is received that is outside of the region of validity for all of the local linear models, then another one is added to improve the approximation of the function at that point. New local models are initiated with the Jacobian matrix $J(x) = \frac{\partial g(x)}{\partial x}$ of the inverse estimate. In other words, we look at how the approximation of the function is changing as we move in this $x$ direction in state space, or estimate the derivative of the function, and use that to set the initial value of our new linear model.

To update our inverse model function, we fit it to the current example $(x_t, q_t)$ by reducing the weighted square error: $err = w_t \cdot ||q_t - g(x_t)||^2$. With this error, a standard gradient descent method is used to update the slopes and offsets of the dimensions of the linear models:

$M^{(k)}_{t+1} = M^{(k)}_t - \eta \frac{\partial err}{\partial M^{(k)}_t}, \;\;\;\;\; b^{(k)}_{t+1} = b^{(k)}_{t} - \eta \frac{\partial err}{/partial b^{(k)}_{t}}$,

where $\eta$ is the learning rate.

Results

So here are some figures of the results of applying the algorithm, and they’re pretty impressive. First we see learning the inverse model for a 5-link arm:

And here are some results looking at varying the learning rate (x-axis) and the effects on (a) time until 10% error reached, (b) final performance error, and (c) final average distance from $q_{home}$, the resting state configuration.

I enjoyed reading this paper, although I thought the ‘Learning operational space control’ papers from Jan Peters and Stephen Schaal deserved a bit more of a shout-out, as they’ve been around since 2006/8 and it’s a very similar method. Also I pulled one of the older papers from the article, so I’ll need to review the other ones too soon, as there is clearly more going on for the successful control of the elephant trunk arm shown in the article.

Other than that, I’m a big fan of breaking up complicated, non-linear functions into a bunch of small linear models, and stitching them all back together to learn an accurate approximation. I think it’s a very likely candidate for the kind of way that the brain goes about learning these incredibly complicated models, especially in the cerebellum I hold these types of methods as a favorite for how biological systems accomplish this. Another important note is that the learning never ‘turns off’ here, which is another big plus to this method. It allows the system to converge upon a solution, but to keep exploring the space as it uses this solution, so more efficient solutions can be found. Additionally, this makes this system much more robust to changing dynamics, resultant of wear and tear or, as in the case of infants, growing.

Again, the model here is learning an kinematic control of the system, specifying what joint angles the system should move to, and not the joint torques that will be required to actually move the system there. But, I am partial to the idea that this kinematic inverse model does then make it a much more smooth function for the kinetic controller to learn from this point.

The way I picture it is specifying a target in end-effector space, learning an kinematic inverse model controller that can specify the appropriate joint trajectory to follow to reach this target. Then an kinetic inverse model controller that takes in the joint space path, looks at the current joint angles / velocities / and accelerations, and learns the appropriate torques or muscle activation commands to move the arm as desired. This is a much, much smoother function to learn than directly learning the appropriate joint torques for a desired target in end effector space, because there are no hidden configuration redundancies. It’s still complex, because it still has to compensate for the inertia and multi-link interaction dynamics, but should still hopefully be much more straight-forward to learn.

Rolf, Matthias. (2011). Online Goal Babbling for rapid bootstrapping of inverse models in high dimensions. Development and Learning (ICDL), 2011 IEEE DOI: 10.1109/DEVLRN.2011.6037368

## Likelihood calculus paper series review part 1 – Controlling variability

Dr. Terry Sanger has a series of papers that have come out in the last few years describing what he has named ‘likelihood calculus’. The goal of these papers is to develop a ‘a theory of optimal control for variable, uncertain, and noisy systems that nevertheless accomplish real-world tasks reliably.’ The idea being that successful performance can be thought of as modulating variance of movement, allocating resources to tightly control motions when required and allowing variability in task-irrelevant dimensions. To perform variability modulation, we first need a means of capturing mathematically how the features of an uncertain controller operating affect variability in system movement. Defining terms quickly, the features of a controller are the different components that produce signals resulting in movement, variability is taken here to be the trial-to-trial variation in movements, and uncertainty means that the available sensory feedback does not uniquely determine the true state of the world, where uncertainty can arise from noise on sensory feedback signals, unmodeled dynamics, and/or quantitization of sensory feedback. To capture all this uncertainty and variability, probability theory will naturally be employed. In this post I will review the paper ‘Controlling variability’ (2010) by Dr. Sanger, which sets up the framework for describing the time course of uncertainty during movement.

Using probability in system representations

So, here’s a picture of the system our controller (the brain) is in:

There’s the input initial state $x$, and the output change in state, $\dot{x}$, which is generated as a combination of the unforced dynamics of the world and the control dynamics effected by the brain. But since we’re dealing with uncertainty and variability, we’re going to rewrite this such that given an initial state $x$, we get a probability distribution over potential changes in state, $p(\dot{x}|x)$, which specifies the likelihood of each change in state $\dot{x}$, given our initial state probability distribution $p(x)$. So in our system diagram, the word and the brain both define probability distributions over the possible changes in state, $p_1(\dot{x}|x)$ and $p_1(\dot{x}|x)$, respectively, which then combine to create the overall system dynamics $p(\dot{x}|x)$. Redrawing our picture to incorporate the probabilities, we get:

One may ask: how do these probabilities combine? Good question! What we’d like to be able to do is combine them through simple linear operators, because they afford us massive simplifications in our calculations, but the combination of $p_0(\dot{x}|x)$ and $p_1(\dot{x}|x)$ isn’t as simple as summing and normalizing. The reason why can be a little tricky to tease out if you’re unfamiliar with probability, but will becomes clear with some thought. Consider what it means to go about combining the probabilities in this way. Basically, if you sum and normalize, then the result is saying that there is a 50% chance of doing what the brain says to do, and a 50% chance of doing what the world says to do, and doesn’t capture what is actually going to happen, which is an interaction between the effects of the brain and the world. A good example for thinking about this is rolling dice. If you roll each die individually, you have an equal chance of the result being each number 1-6, but if you roll two dice, the overall system probability of rolling numbers changes in a highly nonlinear fashion:

Suddenly there is a 0% chance of the result being a 1, and the probability of rolling a number increases in likelihood until you get to 7, at which point the likelihood decreases with ascending numbers; there is a nonlinear interaction at play that can’t be captured by summing the probabilities of rolling a number on each die individually and normalizing.

So summing the probability distributions over the possible changes in state, $\dot{x}$, isn’t going to work, but is there another way to combine these through linear operators? The answer is yes, but it’s going to require us to undergo a bit of a paradigm shift and expand our minds. What if, instead of capturing the change in the system by looking at $p(\dot{x}|x)$, the probability distribution over possible changes in state given the current state, we instead capture the dynamics of the system by defining $\dot{p}(x)$, the change in the probability of states through time? Instead of describing how the system evolves through the likelihood of different state changes, the dynamics are captured by defining the change in likelihood of different states; we are capturing the effect of the brain and the world on the temporal evolution of state probability. Does that freak you out?

Reworking the problem

Hopefully you’re not too freaked out. Now that you’ve worked your head around this concept, let’s look at $\dot{p}(x)$ a little more closely. Specifically, let’s look at the Kramers-Moyal expansion (for a one-dimensional system):

$\frac{\partial p(x,t)}{\partial t} = \sum_{k=1}^{\inf} \left( - \frac{\partial}{\partial x} \right)^k \{a_k(x) p(x)\} / k!$,

$a_k(x) = \int \dot{x}^k p(\dot{x}|x) d\dot{x}$.

As Dr. Sanger notes, this is a daunting equation, but it can be understood relatively easily. The left side, $\frac{\partial p(x,t)}{\partial t} = \dot{p}_t(x)$, is the rate of change of probability at each point $x$ at time $t$. The right side is just the Taylor series expansion. If we take the first two terms of the Taylor series expansion, we get:

$\frac{\partial p(x,t)}{\partial t} = - a_1 \frac{\partial}{\partial x} p(x) + \frac{a_2}{2} \frac{\partial^2}{\partial x^2} p(x)$,

where the first describes how the probability drifts (or shifts / translates), $a_1$ being the average value of $\dot{x}$ for each value of $x$. The second term relates the rate of diffusion, $a_2$ being the second moment of the speed $\dot{x}$, describing the amount of spread in different possible speeds, where greater variability in speed leads to an increased spread of the probability. This is the Fokker-Planck equation, which describes the evolution of a physical process with constant drift and diffusion. At this point we make the assumption that our probability distributions are all going to be in the form of Gaussians (for which the Fokker-Planck equation exactly describes evolution of the system through time, and can arguably act as a good approximation to neural control systems where movement is based on the average activity of populations of neurons).

As an example of this, think of a 1-dimensional system, and a Gaussian probability distribution describing what state the system is likely to be in. The first term $a_1$ is the average rate of change $\dot{x}$ across the states $x$ the system could be in. The probability shifts through state space as specified by $a_1$. Intuitively, if you have a distribution with mean around position 1 and the system velocity is 4, then the change in your probability distribution, $p(x)$, should shift the mean to position 5. The second term, $a_2$ is a measure of how wide the range of different possible speeds $\dot{x}$ is. The larger the range of possible values, the less certain we become about the location of the system as it moves forward in time; the greater the range of possible states the system might end up in in the next time step. This is reflected by the rate of diffusion of the probability distribution. If we know for sure the speed the system moved at (i.e. all possible states will move with a specific $\dot{x}$), then we simply translate the mean of probability distribution. If however there’s uncertainty in the speed at which the system is moving, then the correct location (reflecting the actual system position) for the mean of the probability distribution could be one of a number of values. This is captured by increasing the width of (diffusing) the Gaussian.

Linear operators

Importantly, the equations above are linear in terms of $p(x)$. This means we can rearrange the above equation:

$\frac{\partial p(x,t)}{\partial t} = \left( -a_1 \frac{\partial}{\partial x} + \frac{a_2}{2} \frac{\partial^2}{\partial x^2} \right) p(x)$,

letting $\mathcal{L} = \left( -a_1 \frac{\partial}{\partial x} + \frac{a_2}{2} \frac{\partial^2}{\partial x^2} \right)$, we have

$\frac{\partial p(x,t)}{\partial t} = \mathcal{L} p(x)$.

Now we can redraw our system above as

where

$\mathcal{L} = \mathcal{L}_0 + \mathcal{L}_1$,

which is the straightforward combination of the different contributions of each of the brain and the world to the overall system state probability. How cool is that?

Alright, calm down. Time to look at using these operators. Let’s assume that the overall system dynamics $\mathcal{L}$ hold constant for some period of time (taking particular care to note that ‘constant dynamics’ does not mean that a single constant output is produced irrespective of the input state, but rather that a given input state $x$ always produces the same result while the dynamics are held constant), and we have discretized our representation of $x$ (to be a range of some values, i.e. -100 to 100) then we can find the state probability distribution at time $T$ by calculating

$p(x, T) = A^T p(x,0)$,

where $A^T = e^{T\mathcal{L}}$.

When combining these $\mathcal{L}$ operators, if we sum them, i.e. overall system dynamics

$\mathcal{L} = \mathcal{L}_0 + \mathcal{L}_1$,

and then apply them to the probability

$\frac{\partial p(x,t)}{\partial t} = \mathcal{L}p(x)$

this is saying that the dynamics provided by $\mathcal{L}_0$ and $\mathcal{L}_1$ are applied at the same time. But if we multiply the component dynamics operators,

$\mathcal{L} = \mathcal{L}_1 \mathcal{L}_0$

then when we apply them we have

$\frac{\partial p(x,t)}{\partial t} = \mathcal{L}p(x) = \mathcal{L}_1 \mathcal{L}_0 p(x)$,

which is interpreted as applying $\mathcal{L}_0$ to the system, and then applying $\mathcal{L}_1$. Just basic algebra, but it allows us to apply simultaneously and sequentially the dynamics generated by our contributing system components (i.e. the brain and the world).

Capturing the effects of control

So now we have a representation of the system dynamics operating with variability and under uncertainty, we’re talking about building a tool to use for controlling these systems though, so where does the control signal $u$ fit in? The $\mathcal{L}$ operator is made to be a function of the control signal, describing the probablistic effect of the control signal given the possible initial states in the current state probability distribution $p(x)$. Thus the change in state probability is now written

$\frac{\partial p(x,t)}{\partial t} = \mathcal{L}(u)p(x)$.

Suppose that we drive a system with constant dynamics $\mathcal{L}(u_1)$ for a period of time $T_1$, at which point we change the control signal and drive the system with constant dynamics $\mathcal{L}(u_2)$ for another period of time $T_2$. The state of the system now be calculated

$p(x, T_1 + T_2) = e^{T_2\mathcal{L}(u_2)}e^{T_1\mathcal{L}(u_1)}p(x,0) = A_{u_2}^{T_2}A_{u_1}^{T_1}p(x,0)$

using the sequential application of dynamics operators discussed above.

Conclusion

And that is the essence of the paper ‘Controlling variability’. There is an additional discussion about the relationship to Bayes’ rule, which I will save for another post, and an example, but this is plenty for this post.

The main point from this paper is that we shouldn’t be focusing on the values of states as the object of control, but rather the probability densities of states. By doing this, we can capture the uncertainty in systems and work towards devising an effecting means of control. So, although the paper is called ‘Controlling variability’, the discussion of how to actually control variability is saved for later papers. All the same, I thought this was a very interesting paper, enjoyed working through it, and am looking forward to the rest of the series.

Sanger TD (2010). Controlling variability. Journal of motor behavior, 42 (6), 401-7 PMID: 21184358