## Using VREP for simulation of force-controlled models

I’ve been playing around a bit with different simulators, and one that we’re a big fan of in the lab is VREP. It’s free for academics and you can talk to them about licences if you’re looking for commercial use. I haven’t actually had much experience with it before myself, so I decided to make a simple force controlled arm model to get experience using it. All in all, there were only a few weird API things that I had to get through, and once you have them down it’s pretty straight forward. This post is going to be about the steps that I needed to take to get things all set up. For a more general start-up on VREP check out All the code in this post and the model I use can be found up on my GitHub.

Getting the right files where you need them

As discussed in the remote API overview, you’ll need three files in whatever folder you want to run your Python script from to be able to hook into VREP remotely:

• remoteApi.dll, remoteApi.dylib or remoteApi.so (depending on what OS you’re using)
• vrep.py
• vrepConstants.py

You can find these files inside your VREP_HOME/programming/remoteApiBindings/python/python and VREP_HOME/programming/remoteApiBindings/lib/lib folders. Make sure that these files are in whatever folder you’re running your Python scripts from.

The model

It’s easy to create a new model to mess around with in VREP, so that’s the route I went, rather than importing one of their pre-made models and having some sneaky parameter setting cause me a bunch of grief. You can just right click->add then go at it. There are a bunch of tutorials so I’m not going to go into detail here. The main things are:

• Make sure joints are in ‘Torque/force’ mode.
• Make sure that joint ‘Motor enabled’ property is checked. The motor enabled property is in the ‘Show dynamic properties dialogue’ menu, which you find when you double click on the joint in the Scene Hierarchy.
• Know the names of the joints as shown in the Scene Hierarchy.

So here’s a picture:

where you can see the names of the objects in the model highlighted in red, the Torque/force selection highlighted in blue, and the Motor enabled option highlighted in green. And of course my beautiful arm model in the background.

Setting up the continuous server

The goal is to connect VREP to Python so that we can send torques to the arm from our Python script and get the feedback necessary to calculate those torques. There are a few ways to set up a remote connection in VREP.

The basic one is they have you add a child script in VREP and attach it to an object in the world that sets up a port and then you hit go on your simulation and can then run your Python script to connect in. This gets old real fast. Fortunately, it’s easy to automate everything from Python so that you can connect in, start the simulation, run it for however long, and then stop the simulation without having to click back and forth.

The first step is to make sure that your remoteApiConnections.txt file in you VREP home directory is set up properly. A continuous server should be set up by default, but doesn’t hurt to double check. And you can take the chance to turn on debugging, which can be pretty helpful. So open up that file and make sure it looks like this:

portIndex1_port                 = 19997
portIndex1_debug                = true
portIndex1_syncSimTrigger       = true


Once that’s set up, when VREP starts we can connect in from Python. In our Python script, first we’ll close any open connections that might be around, and then we’ll set up a new connection in:

import vrep

# close any open connections
vrep.simxFinish(-1)
# Connect to the V-REP continuous server
clientID = vrep.simxStart('127.0.0.1', 19997, True, True, 500, 5)

if clientID != -1: # if we connected successfully
print ('Connected to remote API server')


So once the connection is made, we check the clientID value to make sure that it didn’t fail, and then we carry on with our script.

Synchronizing

By default, VREP will run its simulation in its own thread, and both the simulation and the controller using the remote API will be running simultaneously. This can lead to some weird behaviour as things fall out of synch etc etc, so what we want instead is for the VREP sim to only run one time step for each time step the controller runs. To do that, we need to set VREP to synchronous mode. So the next few lines of our Python script look like:

    # --------------------- Setup the simulation

vrep.simxSynchronous(clientID,True)


and then later, once we’ve calculated our control signal, sent it over, and want the simulation to run one time step forward, we do that by calling

    # move simulation ahead one time step
vrep.simxSynchronousTrigger(clientID)


Get the handles to objects of interest

OK the next chunk of code in our script uses the names of our objects (as specified in the Scene Hierarchy in VREP) to get an ID for each which to identify which object we want to send a command to or receive feedback from:

    joint_names = ['shoulder', 'elbow']
# joint target velocities discussed below
joint_target_velocities = np.ones(len(joint_names)) * 10000.0

# get the handles for each joint and set up streaming
joint_handles = [vrep.simxGetObjectHandle(clientID,
name, vrep.simx_opmode_blocking)[1] for name in joint_names]

# get handle for target and set up streaming
_, target_handle = vrep.simxGetObjectHandle(clientID,
'target', vrep.simx_opmode_blocking)


Set dt and start the simulation

And the final thing that we’re going to do in our simulation set up is specify the timestep that we want to use, and then start the simulation. I found this in a forum post, and I must say whatever VREP lacks in intuitive API their forum moderators are on the ball. NOTE: To use a custom time step you have to also set the dt option in the VREP simulator to ‘custom’. The drop down is to the left of the ‘play’ arrow, and if you don’t have it set to ‘custom’ you won’t be able to set the time step through Python.

    dt = .01
vrep.simxSetFloatingParameter(clientID,
vrep.sim_floatparam_simulation_time_step,
dt, # specify a simulation time step
vrep.simx_opmode_oneshot)

# --------------------- Start the simulation

# start our simulation in lockstep with our code
vrep.simxStartSimulation(clientID,
vrep.simx_opmode_blocking)


Main loop

For this next chunk I’m going to cut out everything that’s not VREP, since I have a bunch of posts explaining the control signal derivation and forward transformation matrices.

So, once we’ve started the simulation, I’ve set things up for the arm to be controlled for 1 second and then for the simulation to stop and everything shut down and disconnect.

    while count < 1: # run for 1 simulated second

# get the (x,y,z) position of the target
_, target_xyz = vrep.simxGetObjectPosition(clientID,
target_handle,
-1, # retrieve absolute, not relative, position
vrep.simx_opmode_blocking)
if _ !=0 : raise Exception()
track_target.append(np.copy(target_xyz)) # store for plotting
target_xyz = np.asarray(target_xyz)

q = np.zeros(len(joint_handles))
dq = np.zeros(len(joint_handles))
for ii,joint_handle in enumerate(joint_handles):
# get the joint angles
_, q[ii] = vrep.simxGetJointPosition(clientID,
joint_handle,
vrep.simx_opmode_blocking)
if _ !=0 : raise Exception()
# get the joint velocity
_, dq[ii] = vrep.simxGetObjectFloatParameter(clientID,
joint_handle,
2012, # ID for angular velocity of the joint
vrep.simx_opmode_blocking)
if _ !=0 : raise Exception()


In the above chunk of code, I think the big thing to point out is that I’m using vrep.simx_opmode_blocking in each call, instead of vrep.simx_opmode_buffer. The difference is that you for sure get the current values of the simulation when you use blocking, and you can be behind a time step using buffer.

Aside from that the other notable things are I raise an exception if the first parameter (which is the return code) is ever not 0, and that I use simxGetObjectFloatParameter to get the joint velocity instead of simxGetObjectVelocity, which has a rotational velocity component. Zero is the return code for ‘everything worked’, and if you don’t check it and have some silly things going on you can be veeerrrrryyy mystified when things don’t work. And what simxGetObjectVelocity returns is the rotational velocity of the joint relative to the world frame, and not the angular velocity of the joint in its own coordinates. That was also a briefly confusing.

So the next thing I do is calculate u, which we’ll skip, and then we need to set the forces for the joint. This part of the API is real screwy. You can’t set the force applied to the joint directly. Instead, you have to set the target velocity of the joint to some really high value (hence that array we made before), and then modulate the maximum force that can be applied to that joint. Also important: When you want to apply a force in the other direction, you change the sign on the target velocity, but keep the force sign positive.

        # ... calculate u ...

for ii,joint_handle in enumerate(joint_handles):
# get the current joint torque
_, torque = \
vrep.simxGetJointForce(clientID,
joint_handle,
vrep.simx_opmode_blocking)
if _ !=0 : raise Exception()

# if force has changed signs,
# we need to change the target velocity sign
if np.sign(torque) * np.sign(u[ii]) < 0:
joint_target_velocities[ii] = \
joint_target_velocities[ii] * -1
vrep.simxSetJointTargetVelocity(clientID,
joint_handle,
joint_target_velocities[ii], # target velocity
vrep.simx_opmode_blocking)
if _ !=0 : raise Exception()

# and now modulate the force
vrep.simxSetJointForce(clientID,
joint_handle,
abs(u[ii]), # force to apply
vrep.simx_opmode_blocking)
if _ !=0 : raise Exception()

# move simulation ahead one time step
vrep.simxSynchronousTrigger(clientID)
count += dt


So as you can see we check the current torque, see if we need to change the sign on the target velocity, modulate the maximum allowed force, and then finally step the VREP simulation forward.

Conclusions

And there you go! Here’s an animation of it in action (note this is a super low quality gif and it looks way better / smoother when actually running it yourself):

All in all, VREP has been enjoyable to work with so far. It didn’t take long to get things moving and off the ground, the visualization is great, and I haven’t even scratched the surface of what you can do with it. Best of all (so far) you can fully automate everything from Python. Hopefully this is enough to help some people get their own models going and save a few hours and headaches! Again, the full code and the model are up on my GitHub.

Nits

• When you’re applying your control signal, make sure you test each joint in isolation, to make sure your torques push things in the direction you think they do. I had checked the rotation direction in VREP, but the control signal for both joints ended up needing to be multiplied by -1.
• Another nit when you’re building your model, if you use the rotate button from the VREP toolbar on your model, wherever that joint rotates to is now 0 degrees. If you want to set the joint to start at 45 degrees, instead double click and change Pos[deg] option inside ‘Joint’ in Scene Object Properties.

## Deep learning for control using augmented Hessian-free optimization

Traditionally, deep learning is applied to feed-forward tasks, like classification, where the output of the network doesn’t affect the input to the network. It is a decidedly harder problem when the output is recurrently connected such that network output affects the input. Because of this application of deep learning methods to control was largely unexplored until a few years ago. Recently, however, there’s been a lot of progress and research in this area. In this post I’m going to talk about an implementation of deep learning for control presented by Dr. Ilya Sutskever in his thesis Training Recurrent Neural Networks.

In his thesis, Dr. Sutskever uses augmented Hessian-free (AHF) optimization for learning. There are a bunch of papers and posts that go into details about AHF, here’s a good one by Andrew Gibiansky up on his blog, that I recommend you check out. I’m not going to really talk much here about what AHF is specifically, or how it differs from other methods, if you’re unfamiliar there are lots of places you can read up on it. Quickly, though, AHF is kind of a bag of tricks you can use with a fast method for estimating the curvature of the loss function with respect to the weights of a neural network, as well as the gradient, which allows it to make larger updates in directions where the loss function doesn’t change quickly. So rather than estimating the gradient and then doing a small update along each dimension, you can make the size of your update large in directions that change slowly and small along dimensions where things change quickly. And now that’s enough about that.

In this post I’m going to walk through using a Hessian-free optimization library (version 0.3.8) written by my compadre Dr. Daniel Rasmussen to train up a neural network to train up a 2-link arm, and talk about the various hellish gauntlets you need run to get something that works. Whooo! The first thing to do is install this Hessian-free library, linked above.

I’ll be working through code edited a bit for readability, to find the code in full you can check out the files up on my GitHub.

Build the network

Dr. Sutskever specified the structure of the network in his thesis to be 4 layers: 1) a linear input layer, 2) 100 Tanh nodes, 3) 100 Tanh nodes, 4) linear output layer. The network is connected up with the standard feedforward connections from 1 to 2 to 3 to 4, plus recurrent connections on 2 and 3 to themselves, plus a ‘skip’ connection from layer 1 to layer 3. Finally, the input to the network is the target state for the plant and the current state of the plant. So, lots of recursion! Here’s a picture:

The output layer connects in to the plant, and, for those unfamiliar with control theory terminology, ‘plant’ just means the system that you’re controlling. In this case an arm simulation.

Before we can go ahead and set up the network that we want to train, we also need to specify the loss function that we’re going to be using during training. The loss function in Ilya’s thesis is a standard one:

$L(\theta) = \sum\limits^{N-1}\limits_{t=0} \ell(\textbf{u}_t) + \ell_f(\textbf{x}_N),$
$\ell(\textbf{u}_t) = \alpha \frac{||\textbf{u}_t||^2}{2},$
$\ell_f(\textbf{x}_N) = \frac{||\textbf{x}^* - \textbf{x}_t||^2}{2}$

where $L(\theta)$ is the total cost of the trajectory generated with $\theta$, the set of network parameters, $\ell(\textbf{u})$ is the immediate state cost, $\ell_f(\textbf{x})$ is the final state cost, $\textbf{x}$ is the state of the arm, $\textbf{x}^*$ is the target state of the arm, $\textbf{u}$ is the control signal (torques) that drives the arm, and $\alpha$ is a gain value.

To code this up using the hessianfree library we do:

from hessianfree import RNNet
from hessianfree.nonlinearities import (Tanh, Linear, Plant)
from hessianfree.loss_funcs import SquaredError, SparseL2

l2gain = 10e-3 * dt # gain on control signal loss
rnn = RNNet(
# specify the number of nodes in each layer
shape=[num_states * 2, 96, 96, num_states, num_states],
# specify the function of the nodes in each layer
layers=[Linear(), Tanh(), Tanh(), Linear(), plant],
# specify the layers that have recurrent connections
rec_layers=[1,2],
# specify the connections between layers
conns={0:[1, 2], 1:[2], 2:[3], 3:[4]},
# specify the loss function
loss_type=[
# squared error between plant output and targets
SquaredError(),
# penalize magnitude of control signal (output of layer 3)
SparseL2(l2gain, layers=[3])],
use_GPU=True)


Note that if you want to run it on your GPU you’ll need PyCuda and sklearn installed. And a GPU.

An important thing to note as well is that in Dr. Sustkever’s thesis when we’re calculating the squared error of the distance from the arm state to the target, this is measured in joint angles. So it’s kind of a weird set up to be looking at the movement of the hand but have your cost function in joint-space instead of end-effector space, but it definitely simplifies training by making the cost more directly relatable to the control signal. So we need to calculate the joint angles of the arm that will have the hand at different targets around a circle. To do this we’ll take advantage of our inverse kinematics solver from way back when, and use the following code:

def gen_targets(arm, n_targets=8, sig_len=100):
#Generate target angles corresponding to target
#(x,y) coordinates around a circle
import scipy.optimize

x_bias = 0
if arm.DOF == 2:
y_bias = .35
dist = .075
elif arm.DOF == 3:
y_bias = .5
dist = .2

# set up the reaching trajectories around circle
targets_x = [dist * np.cos(theta) + x_bias \
for theta in np.linspace(0, np.pi*2, 65)][:-1]
targets_y = [dist * np.sin(theta) + y_bias \
for theta in np.linspace(0, np.pi*2, 65)][:-1]

joint_targets = []
for ii in range(len(targets_x)):
joint_targets.append(arm.inv_kinematics(xy=(targets_x[ii],
targets_y[ii])))
targs = np.asarray(joint_targets)

# repeat the targets over time
for ii in range(targs.shape[1]-1):
targets = np.concatenate(
(np.outer(targs[:, ii], np.ones(sig_len))[:, :, None],
np.outer(targs[:, ii+1], np.ones(sig_len))[:, :, None]), axis=-1)
targets = np.concatenate((targets, np.zeros(targets.shape)), axis=-1)
# only want to penalize the system for not being at the
# target at the final state, set everything before to np.nan
targets[:, :-1] = np.nan

return targets


And you can see in the last couple lines that to implement the distance to target as a final state cost penalty only we just set all of the targets before the final time step equal to np.nan. If we wanted to penalize distance to target throughout the whole trajectory we would just comment that line out.

Create the plant

You’ll notice in the code that defines our RNN I set the last layer of the network to be plant, but that that’s not defined anywhere. Let’s talk. There are a couple of things that we’re going to need to incorporate our plant into this network and be able to use any deep learning method to train it. We need to be able to:

1. Simulate the plant forward; i.e. pass in input and get back the resulting plant state at the next timestep.
2. Calculate the derivative of the plant state with respect to the input; i.e. how do small changes in the input affect the state.
3. Calculate the derivative of the plant state with respect to the previous state; i.e. how do small changes in the plant state affect the state at the next timestep.
4. Calculate the derivative of the plant output with respect to its state; i.e. how do small changes in the current position of the state affect the output of the plant.

So 1 is easy, we have the arm simulations that we want already, they’re up on my GitHub. Number 4 is actually trivial too, because the output of our plant is going to be the state itself, so the derivative of the output with respect to the state is just the identity matrix.

For 2 and 3, we’re going to need to calculate some derivatives. If you’ve read the last few posts you’ll note that I’m on a finite differences kick. So let’s get that going! Because no one wants to calculate derivatives!

Important note, the notation in these next couple pieces of code is going to be a bit different from my normal notation because they’re matching with the hessianfree library notation, which is coming from a reinforcement learning literature background instead of a control theory background. So, s is the state of the plant, and x is the input to the plant. I know, I know. All the same, make sure to keep that in mind.

# calculate ds0/dx0 with finite differences
d_input_FD = np.zeros((x.shape[0], # number of trials
x.shape[1], # number of inputs
self.state.shape[1])) # number of states
for ii in range(x.shape[1]):
# calculate state adding eps to x[ii]
self.reset_plant(self.prev_state)
inc_x = x.copy()
inc_x[:, ii] += self.eps
self.activation(inc_x)
state_inc = self.state.copy()
# calculate state subtracting eps from x[ii]
self.reset_plant(self.prev_state)
dec_x = x.copy()
dec_x[:, ii] -= self.eps
self.activation(dec_x)
state_dec = self.state.copy()

d_input_FD[:, :, ii] = \
(state_inc - state_dec) / (2 * self.eps)
d_input_FD = d_input_FD[..., None]


Alrighty. First we create a tensor to store the results. Why is it a tensor? Because we’re going to be doing a bunch of runs at once. So our state dimensions are actually trials x size_input. When we then take the partial derivative, we end up with trials many size_input x size_state matrices. Then we increase each of the parameters of the input slightly one at a time and store the results, decrease them all one at a time and store the results, and compute our approximation of the gradient.

Next we’ll do the same for calculating the derivative of the state with respect to the previous state.

# calculate ds1/ds0
d_state_FD = np.zeros((x.shape[0], # number of trials
self.state.shape[1], # number of states
self.state.shape[1])) # number of states
for ii in range(self.state.shape[1]):
# calculate state adding eps to self.state[ii]
state = np.copy(self.prev_state)
state[:, ii] += self.eps
self.reset_plant(state)
self.activation(x)
state_inc = self.state.copy()
# calculate state subtracting eps from self.state[ii]
state = np.copy(self.prev_state)
state[:, ii] -= self.eps
self.reset_plant(state)
self.activation(x)
state_dec = self.state.copy()

d_state_FD[:, :, ii] = \
(state_inc - state_dec) / (2 * self.eps)
d_state_FD = d_state_FD[..., None]


Great! We’re getting closer to having everything we need. Another thing we need is a wrapper for running our arm simulation. It’s going to look like this:

def activation(self, x):
state = []
# iterate through and simulate the plant forward
# for each trial
for ii in range(x.shape[0]):
self.arm.reset(q=self.state[ii, :self.arm.DOF],
dq=self.state[ii, self.arm.DOF:])
self.arm.apply_torque(u[ii])
state.append(np.hstack([self.arm.q, self.arm.dq]))
state = np.asarray(state)

self.state = self.squashing(state)


This is definitely not the fastest code to run. Much more ideally we would put the state and input into vectors and do a single set of computations for each call to activation rather than having that for loop in there. Unfortunately, though, we’re not assuming that we have access to the dynamics equations / will be able to pass in vector states and inputs.

Squashing
Looking at the above code that seems pretty clear what’s going on, except you might notice that last line calling self.squashing. What’s going on there?

The squashing function looks like this:

def squashing(self, x):
index_below = np.where(x < -2*np.pi)
x[index_below] = np.tanh(x[index_below]+2*np.pi) - 2*np.pi
index_above = np.where(x > 2*np.pi)
x[index_above] = np.tanh(x[index_above]-2*np.pi) + 2*np.pi
return x


All that’s happening here is that we’re taking our input, and doing nothing to it as long as it doesn’t start to get too positive or too negative. If it does then we just taper it off and prevent it from going off to infinity. So running a 1D vector through this function we get:

This ends up being a pretty important piece of the code here. Basically it prevents wild changes to the weights during learning to result in the system breaking down. So the state of the plant can’t go off to infinity and cause an error to be thrown, stopping our simulation. But because the target state is well within the bounds of where the squashing function does nothing, post-training we’ll still be able to use the resulting network to control a system that doesn’t have this fail safe built in. Think of this function as training wheels that catch you only if you start falling.

With that, we no have pretty much all of the parts necessary to begin training our network!

Training the network

We’re going to be training this network on the centre-out reaching task, where you start at a centre point and reach to a bunch of target locations around a circle. I’m just going to be re-implementing the task as it was done in Dr. Sutskever’s thesis, so we’ll have 64 targets around the circle, and train using a 2-link arm. Here’s the code that we’ll use to actually run the training:

for ii in range(last_trial+1, num_batches):
# train a bunch of batches using the same input every time
# to allow the network a chance to minimize things with
# stable input (speeds up training)
err = rnn.run_batches(plant, targets=None,
max_epochs=batch_size,
optimizer=HessianFree(CG_iter=96, init_damping=100))

# save the weights to file, track trial and error
# err = rnn.error(inputs)
err = rnn.best_error
name = 'weights/rnn_weights-trial%04i-err%.3f'%(ii, err)
np.savez_compressed(name, rnn.W)


A quick aside: if you want to run this code yourself, get a real good computer, have an arm simulation ready, the hessianfree Python library installed, and download and run this train_hf.py file. (Note: I used version 0.3.8 of the hessianfree library, which you can install using pip install hessianfree==0.3.8) This will start training and save the weights into a weights/ folder, so make sure that that exists in the same folder as train_hf.py. If you want to view the results of the training at any point run the plot_error.py file, which will load in the most recent version of the weights and plot the error so far. If you want to generate an animated plot like I have below run gen_animation_plots.py and then the last command from my post on generating animated gifs.

Another means of seeing the results of your trained up network is to use the controller I’ve implemented in my controls benchmarking suite, which looks for a set of saved weights in the controllers/weights folder, and will load it in and use it to generate command signals for the arm by running it with

python run.py arm2_python ahf reach --dt=1e-2


where you replace arm2_python with whatever arm model you trained your model on. Note the --dt=1e-2 flag, that is important because the model was trained with a .01 timestep and things get a bit weird if you suddenly change the dynamics on the controller.

OK let’s look at some results!

Results

OK full discretion, these results are not optimizing the cost function we discussed above. They’re implementing a simpler cost function that only looks at the final state, i.e. it doesn’t penalize the magnitude of the control signal. I did this because Dr. Sutskever says in his thesis he was able to optimize with just the final state cost using much smaller networks. I originally looked at neurons with 96 neurons in each layer, and it just took forgoddamnedever to run. So after running for 4 weeks (not joking) and needing to make some more changes I dropped the number of neurons and simplified the task.

The results below are from running a network with 32 neurons in each layer controlling this 2-link arm, and took another 4-5 weeks to train up.

Hey that looks good! Not bad, augmented Hessian-free learning, not bad. It had pretty consistent (if slow) decline in the error rate, with a few crazy bumps from which it quickly recovered. Also take note that each training iteration is actually 32 runs, so it’s not 12,50-ish runs it’s closer to 400,000 training runs that it took to get here.

One biggish thing that was a pain was that it turns out that I only trained the neural network for reaching in the one direction, and when you only train it to reach one way it doesn’t generalize to reaching back to the starting point (which, fair enough). But, I didn’t realize this until I was took the trained network and ran it in the benchmarking code, at which point I was not keen to redo all of the training it took to get the neural network to the level of accuracy it was at under a more complicated training set. The downside of this is that even though I’ve implemented a controller that takes in the trained network and uses it to control the arm, to do the reaching task I have to just do a hard reset after the arm reaches the target, because it can’t reach back to the center, like all the other controllers. All the same, here’s an animation of the trained up AHF controller reaching to 8 targets (it was trained on all 64 above though):

Things don’t always go so smoothly, though. Here’s results from another training run that took around 2-3 weeks, and uses a different 2-link arm model (translated from Matlab code written by Dr. Emo Todorov):

What I found frustrating about this was that if you look at the error over time then this arm is doing as well or better than the previous arm at a lot of points. But the corresponding trajectories look terrible, like something you would see in a horror movie based around getting good machine learning results. This of course comes down to how I specified the cost function, and when I looked at the trajectories plotted over time the velocity of the arm is right at zero at the final time step, which it is not quiiiitte the case for the first controller. So this second network has found a workaround to minimize the cost function I specified in a way I did not intend. To prevent this, doing something like weighting the distance to target heavier than non-zero velocity would probably work. Or possibly just rerunning the training with a different random starting point you could get out a better controller, I don’t have a great feel for how important the random initialization is, but I’m hoping that it’s not all too important and its effects go to zero with enough training. Also, it should be noted I’ve run the first network for 12,500 iterations and the second for less than 6,000, so I’ll keep letting them run and maybe it will come around. The first one looked pretty messy too until about 4,000 iterations in.

Training regimes

Frustratingly, the way that you train deep networks is very important. So, very much like the naive deep learning network trainer that I am, I tried the first thing that pretty much anyone would try:

• run the network,
• update the weights,
• repeat.

This is what I’ve done in the results above. And it worked well enough in that case.

If you remember back to the iLQR I made a little while ago, I was able to change the cost function to be

$L(\theta) = \sum\limits^{N-1}\limits_{t=0} \ell(\textbf{u}_t) + \ell_f(\textbf{x}_N),$
$\ell(\textbf{u}_t, \textbf{x}_t) = \alpha \frac{||\textbf{u}_t||^2}{2} + \frac{||\textbf{x}^* - \textbf{x}_t||^2}{2},$
$\ell_f(\textbf{x}_N) = \frac{||\textbf{x}^* - \textbf{x}_t||^2}{2}$

(i.e. to include a penalty for distance to target throughout the trajectory and not just at the final time step) which resulted in straighter trajectories when controlling the 2-link arm. So I thought I would try this here as well. Sadly (incredibly sadly), this was fairly fruitless. The network didn’t really learn or improve much at all.

After much consideration and quandary on my part, I talked with Dr. Dan and he suggested that I try another method:

• run the network,
• record the input,
• hold the input constant for a few batches of weight updating,
• repeat.

This method gave much better results. BUT WHY? I hear you ask! Good question. Let me give giving explanation a go.

Essentially, it’s because the cost function is more complex now. In the first training method, the output from the plant is fed back into the network as input at every time step. When the cost function was simpler this was OK, but now we’re getting very different input to train on at every iteration. So the system is being pulled in different directions back and forth at every iteration. In the second training regime, the same input is given several times in a row, which let’s the system follow the same gradient for a few training iterations before things change again. In my head I picture this as giving the algorithm a couple seconds to catch its breath dunking it back underwater.

This is a method that’s been used in a bunch of places recently. One of the more high-profile instances is in the results published from DeepMind on deep RL for control and for playing Go. And indeed, it also works well here.

To implement this training regime, we set up the following code:

for ii in range(last_trial+1, num_batches):

# run the plant forward once
rnn.forward(input=plant, params=rnn.W)

# get the input and targets from above rollout
inputs = plant.get_vecs()[0].astype(np.float32)
targets = np.asarray(plant.get_vecs()[1], dtype=np.float32)

# train a bunch of batches using the same input every time
# to allow the network a chance to minimize things with
# stable input (speeds up training)
err = rnn.run_batches(inputs, targets, max_epochs=batch_size,
optimizer=HessianFree(CG_iter=96, init_damping=100))

# save the weights to file, track trial and error
# err = rnn.error(inputs)
err = rnn.best_error
name = 'weights/rnn_weights-trial%04i-err%.3f'%(ii, err)
np.savez_compressed(name, rnn.W)


So you can see that we do one rollout with the weights, then go in and get the inputs and targets that were used in that rollout, and start training the network while holding those constant for batch_size epochs (training sessions). From a little bit of messing around I’ve found batch_size=32 to be a pretty good number. So then it runs 32 training iterations where it’s updating the weights, and then saves those weights (because we want a loooottttt of check-points) and then restarts the loop.

Embarrassingly, I’ve lost my simulation results from this trial, somehow…so I don’t have any nice plots to back up the above, unfortunately. But since this is just a blog post I figured I would at least talk about it a little bit, since people might still find it useful if they’re just getting into the field like me. and just update this post whenever I re-run them. If I rerun them.

What I do have, however, are results where this method doesn’t work! I tried this with the simpler cost function, that only looks at the final state distance from the target, and it did not go so well. Let’s look at that one!

My guess here is basically that the system has gotten to a point where it’s narrowed things down in the parameter space and now when you run 32 batches it’s overshooting. It needs feedback about its updates after every update at this point. That’s my guess, at least. So it could be the case that for more complex cost functions you’d want to train it while holding the input constant for a while, and then when the error starts to plateau switch to updating the input after every parameter update.

Conclusions

All in all, AHF for training neural networks in control is pretty awesome. There are of course still some major hold-backs, mostly related to how long it takes to train up a network, and having to guess at effective training regimes and network structures etc. But! It was able to train up a relatively small neural network to move an arm model from a center point to 64 targets around a circle, with no knowledge of the system under control at all. In Dr. Sutskever’s thesis he goes on to use the same set up under more complicated circumstances, such as when there’s a feedback delay, or a delay on the outgoing control signal, and unexpected noise etc, so it is able to learn under a number of different, fairly complex situations. Which is pretty slick.

Related to the insane training time required, I very easily could be missing some basic thing that would help speed things up. If you, reader, get ambitious and run the code on your own machine and find out useful methods for speeding up the training please let me know! Personally, my plan is to next investigate guided policy search, which seems like it’s found a way around this crazy training time.

## Simultaneous perturbation vs finite differences for linear dynamics estimation and control signal optimization

Recently in my posts I’ve been using finite differences to approximate the gradient of loss functions and dynamical systems, with the intention of creating generalizable controllers that can be run on any system without having to calculate out derivatives beforehand. Finite differences is pretty much the most straight-forward way of approximating a gradient that there is: vary each parameter up and down (assuming we’re doing central differencing), one at a time, run it through your function and estimate the parameters effect on the system by calculating the difference between resulting function output. To do this requires 2 samples of the function for each parameter.

But there’s always more than one way to peel an avocado, and another approach that’s been used with much success is the Simultaneous Perturbation Stochastic Approximation (SPSA) algorithm, which was developed by Dr. James Spall (link to overview paper). SPSA is a method of gradient approximation, like finite differences, but, critically, the difference is that it varies all of the parameters at once, rather than one at a time. As a result, you can get an approximation of the gradient with far fewer samples from the system, and also when you don’t have explicit control over your samples (i.e. the ability to vary each parameter one at a time).

Given some function $\textbf{f}$ dependent on some set of parameters $\textbf{x}$, we’re used to finding the gradient of $\textbf{f}(\textbf{x})$ using FDSA (finite differences stochastic approximation) written in this form:

$\textbf{f}_x = \frac{\textbf{f}(\textbf{x} + \Delta \textbf{x}) - \textbf{f}(\textbf{x} - \Delta \textbf{x})}{2 \Delta \textbf{x}} = \Delta x^{-1} \frac{\textbf{f}(\textbf{x} + \Delta \textbf{x}) - \textbf{f}(\textbf{x} - \Delta \textbf{x})}{2},$

where $\Delta\textbf{x}$ is a perturbation to the parameter set, and the subscript on the left-hand side denotes the derivative of $\textbf{f}(\textbf{x})$ with respect to $\textbf{x}$.

And that’s how we’ve calculated it before, estimating the gradient of a single parameter at a time. But, we can rewrite this for a set perturbations $\Delta \textbf{X} = [\Delta\textbf{x}_0, ... , \Delta\textbf{x}_N]^T$:

$\textbf{f}_\textbf{x} = \Delta \textbf{X}^{-1} \textbf{F},$

where

$\textbf{F} = [\frac{\textbf{f}(\textbf{x} + \Delta \textbf{x}_0) - \textbf{f}(\textbf{x} - \Delta \textbf{x}_0)}{2}, ... , \frac{\textbf{f}(\textbf{x} + \Delta \textbf{x}_0) - \textbf{f}(\textbf{x}_N - \Delta \textbf{x}_N)}{2}]^T$,

which works as long as $\Delta\textbf{X}$ is square. When it’s not square (i.e. we have don’t have the same number of samples as we have parameters), we run into problems, because we can’t calculate $\Delta\textbf{X}^{-1}$ directly. To address this, let’s take a step back and then work forward again to get a more general form that works for non-square $\Delta \textbf{X}$ too.

By rewriting the above, and getting rid of the inverse by moving $\Delta\textbf{x}$ back to the other side, we have:

$\Delta\textbf{X} \; \textbf{f}_\textbf{x} = \textbf{F}$

Now, the standard trick to move a matrix that’s not square is to just make it square by multiplying it by its transpose to get a square matrix, and then the whole thing by the inverse:

$\Delta\textbf{X}^T \Delta\textbf{X} \; \textbf{f}_\textbf{x} = \Delta\textbf{X}^T \textbf{F}$

$(\Delta\textbf{X}^T \Delta\textbf{X})^{-1} (\Delta\textbf{X}^T \Delta\textbf{X}) \textbf{f}_\textbf{x} = (\Delta\textbf{X}^T \Delta\textbf{X})^{-1} \Delta\textbf{X}^T \textbf{F}$

$\textbf{f}_\textbf{x} = (\Delta\textbf{X}^T \Delta\textbf{X})^{-1} \Delta\textbf{X}^T \textbf{F}$

Alright! Now we’re comfortable with this characterization of gradient approximation using a form that works with non-square perturbation matrices.

Again, in FDSA, we only vary one parameter at a time. This means that there will only ever be one non-zero entry per row of $\Delta \textbf{X}$. By contrast, in SPSA, we vary multiple parameters, and so rows of $\Delta\textbf{X}$ will be just chalk full of non-zero entries.

Gradient approximation to estimate $\textbf{f}_\textbf{x}$ and $\textbf{f}_\textbf{u}$ for LQR control

Implementing this was pretty simple. Just had to modify the calc_derivs function, which I use to estimate the derivative of the arm with respect to the state and control signal, in my LQR controller code by changing from standard finite differences to simultaneous perturbation:

def calc_derivs(self, x, u):
"""" calculate gradient of plant dynamics using Simultaneous
Perturbation Stochastic Approximation (SPSA). Implemented
based on (Peters & Schaal, 2008).

x np.array: the state of the system
u np.array: the control signal
"""
# Initialization and coefficient selection
num_iters = 20

eps = 1e-4
delta_K = None
delta_J = None
for ii in range(num_iters):
# Generation of simultaneous perturbation vector
# choose each component from a Bernoulli +-1 distribution
# with probability of .5 for each +-1 outcome.
delta_k = np.random.choice([-1,1],
size=len(x) + len(u),
p=[.5, .5])
# Function evaluations
inc_x = np.copy(x) + eps * delta_k[:len(x)]
inc_u = np.copy(u) + eps * delta_k[len(x):]
state_inc = self.plant_dynamics(inc_x, inc_u)
dec_x = np.copy(x) - eps * delta_k[:len(x)]
dec_u = np.copy(u) - eps * delta_k[len(x):]
state_dec = self.plant_dynamics(dec_x, dec_u)

delta_j = ((state_inc - state_dec) /
(2.0 * eps)).reshape(-1)

# Track delta_k and delta_j
delta_K = delta_k if delta_K is None else \
np.vstack([delta_K, delta_k])
delta_J =  delta_j if delta_J is None else \
np.vstack([delta_J, delta_j])

f_xu = np.dot(np.linalg.pinv(np.dot(delta_K.T, delta_K)),
np.dot(delta_K.T, delta_J))
f_x = f_xu[:len(x)]
f_u = f_xu[len(x):]

return f_x.T , f_u.T


A couple notes about the above code. First, you’ll notice that the f_x and f_b matrices are both calculated at the same time. That’s pretty slick! And that calculation for f_xu is just a straight implementation of the matrix form of gradient approximation, where I’ve arranged things so that f_x is in the top part and f_u is in the lower part.

The second thing is that the perturbation vector delta_k is generated from a Bernoulli distribution. The reason behind this is that we want to have a bunch of different samples that pretty reasonably spread the state space and move all the parameters independently. Making each perturbation some distance times -1 or 1 is an easy way to achieve this.

Thirdly, there’s the num_iters variable. This is a very important variable, as it dictates how many random samples of our system we take before we estimate the gradient. I’ve found that to get this to work for both the 2-link arm and the more complex 3-link arm, it needs to be at least 20. Or else things explode and die horribly. Just…horribly.

OK let’s look at the results:

The first thing to notice is that I’ve finally discovered the Seaborn plotting package. The second is that SPSA does as well as FDSA.

You may ask: Is there any difference? Well, if we time these functions, on my lil’ laptop, for the 2-link arm it takes SPSA approximately 2.0ms, but it takes FDSA only 0.8ms. So for the same performance the SPSA is taking almost 3 times as long to run. Why? This boils down to how many times the system dynamics need to be sampled by each algorithm to get a good approximation of the gradient. For a 2-link arm, FDSA has 6 parameters ($\textbf{q}, \dot{\textbf{q}},$ and $\textbf{u}$) that it needs to sample twice (we’re doing central differencing), for a total of 12 samples. And as I mentioned above, the SPSA algorithm needs 20 samples to be stable.

For the 3-link arm, SPSA took about 3.1ms on average and FDSA (which must now perform 18 samples of the dynamics) still only 2.1ms. So number of samples isn’t the only cause of time difference between these two algorithms. SPSA needs to perform that a few more matrix operations, including a matrix inverse, which is expensive, while FDSA can calculate the gradient of each parameter individually, which is much less expensive.

OK so SPSA not really impressive here. BUT! As I discovered, there are other means of employing SPSA.

Gradient approximation to optimize the control signal directly

In the previous set up we were using SPSA to estimate the gradient of the system under control, and then we used that gradient to calculate a control signal that minimized the loss function (as specified inside the LQR). This is one way to use gradient approximation methods. Another way to use these methods is approximate the gradient of the loss function directly, and use that information to iteratively calculate a control signal that minimizes the loss function. This second application is the primary use of the SPSA algorithm, and is what’s described by Dr. Spall in his overview paper.

In this application, the algorithm works like this:

2. perturb input and simulate results
3. observe loss function and calculate gradient
4. update input to system
5. repeat to convergence

Because in this approach we’re iteratively optimizing the input using our gradient estimation, having a noisy estimate is no longer a death sentence, as it was in the LQR. If we update our input to the system with several noisy gradient estimates the noise will essentially just cancel itself out. This means that SPSA now has a powerful advantage over FDSA: Since in SPSA we vary all parameters at once, only 2 samples of the loss function are used to estimate the gradient, regardless of the number of parameters. In contrast, FDSA needs to sample the loss function twice for every input parameter. Here’s a picture from (Spall, 1998) that shows the two running against each other to optimize a 2D problem:

This gets across that even though SPSA bounces around more, they both reach the solution in the same number of steps. And, in general, this is the case, as Dr. Spall talks about in the paper. There’s also a couple more details of the algorithm, so let’s look at it in detail. Here’s the code, which is just a straight translation into Python out of the description in Dr. Spall’s paper:

# Step 1: Initialization and coefficient selection
max_iters = 5
converge_thresh = 1e-5

alpha = 0.602 # from (Spall, 1998)
gamma = 0.101
a = .101 # found empirically using HyperOpt
A = .193
c = .0277

delta_K = None
delta_J = None
u = np.copy(self.u) if self.u is not None \
else np.zeros(self.arm.DOF)
for k in range(max_iters):
ak = a / (A + k + 1)**alpha
ck = c / (k + 1)**gamma

# Step 2: Generation of simultaneous perturbation vector
# choose each component from a bernoulli +-1 distribution with
# probability of .5 for each +-1 outcome.
delta_k = np.random.choice([-1,1], size=arm.DOF, p=[.5, .5])

# Step 3: Function evaluations
inc_u = np.copy(u) + ck * delta_k
cost_inc = self.cost(np.copy(state), inc_u)
dec_u = np.copy(u) - ck * delta_k
cost_dec = self.cost(np.copy(state), dec_u)

gk = np.dot((cost_inc - cost_dec) / (2.0*ck), delta_k)

# Step 5: Update u estimate
old_u = np.copy(u)
u -= ak * gk

# Step 6: Check for convergence
if np.sum(abs(u - old_u)) < converge_thresh:
break


The main as-of-yet-unexplained parts of this code are the alpha, gamma, a, A, and c variables. What’s their deal?

Looking inside the loop, we can see that ck controls the magnitude of our perturbations. Looking a little further down, ak is just the learning rate. And all of those other parameters are just involved in shaping the trajectories that ak and ck follow through iterations, which is a path towards zero. So the first steps and perturbations are the biggest, and each successively becomes smaller as the iteration count increases.

There are a few heuristics that Dr. Spall goes over, but there aren’t any hard and fast rules for setting a, A, and c. Here, I just used HyperOpt to find some values that worked pretty well for this particular problem.

The FDSA version of this is also very straight-forward:

# Step 1: Initialization and coefficient selection
max_iters = 10
converge_thresh = 1e-5

eps = 1e-4
u = np.copy(self.u) if self.u is not None \
else np.zeros(self.arm.DOF)
for k in range(max_iters):

gk = np.zeros(u.shape)
for ii in range(gk.shape[0]):
# Step 2: Generate perturbations one parameter at a time
inc_u = np.copy(u)
inc_u[ii] += eps
dec_u = np.copy(u)
dec_u -= eps

# Step 3: Function evaluation
cost_inc = self.cost(np.copy(state), inc_u)
cost_dec = self.cost(np.copy(state), dec_u)

gk[ii] = (cost_inc - cost_dec) / (2.0 * eps)

old_u = np.copy(u)
# Step 5: Update u estimate
u -= 1e-5 * gk

# Step 6: Check for convergence
if np.sum(abs(u - old_u)) < converge_thresh:
break


You’ll notice that in both the SPSA and FDSA code we’re no longer sampling plant_dynamics, we’re instead sampling cost, a loss function I defined. From just my experience playing around with these algorithms a bit, getting the loss function to be appropriate and give the desired behaviour is definitely a bit of an art. It feels like much more of an art than in other controllers I’ve coded, but that could just be me.

The cost function that I’m using is pretty much the first thing you’d think of. It penalizes distance to target and having non-zero velocity. Getting the weighting between distance to target and velocity set up so that the arm moves to the target but also doesn’t overshoot definitely took a bit of trial and error, er, I mean empirical analysis. Here’s the cost function that I found worked pretty well, note that I had to make special cases for the different arms:

 def cost(self, x, u):
dt = .1 if self.arm.DOF == 3 else .01
next_x = self.plant_dynamics(x, u, dt=dt)
vel_gain = 100 if self.arm.DOF == 3 else 10
return (np.sqrt(np.sum((self.arm.x - self.target)**2)) * 1000 \
+ np.sum((next_x[self.arm.DOF:])**2) * vel_gain)


So that’s all the code, let’s look at the results!

For these results, I used a max of 10 iterations for optimizing the control signal. I was definitely surprised by the quality of the results, especially for the 3-link arm, compared to the results generated by a standard LQR controller. Although I need to note, again, that it was a fair bit of me playing around with the exact cost function to get these results. Lots of empirical analysis.

The two controllers generate results that are identical to visual inspection. However, especially in the 3-link arm, the time required to run the FDSA was significantly longer than the SPSA controller. It took approximately 140ms for the SPSA controller to run a single loop, but took FDSA on average 700ms for a single loop of calculating the control signal. Almost 5 times as long! For the same results! In directly optimizing the control signal, SPSA gets a big win over standard FDSA. So, if you’re looking to directly optimize over a loss function, SPSA is probably the way you want to go.

Conclusions

First off, I thought it was really neat to directly apply gradient approximation methods to optimizing the control signal. It’s something I haven’t tried before, but definitely makes sense, and can generate some really nice results when tuned properly. Automating the tuning is definitely I’ll be discussing in future posts, because doing it by hand takes a long time and is annoying.

In the LQR, the gradient approximation was best done by the FDSA. I think the main reasons for this is that in solving for the control signal the LQR algorithm uses matrix inverses, and any errors in the linear approximations to the dynamics are going to be amplified quite a bit. If I did anything less than 10-15 iterations (20 for the 3-link arm) in the SPSA approximation then things exploded. Also, here the SPSA algorithm required a matrix inverse, where the FDSA didn’t. This is because we only varied one parameter at a time in FDSA, and the effects of changing each was isolated. In the SPSA case, we had to consider the changes across all the variables and the resulting effects all at once, essentially noting which variables changed by how much and the changes in each case, and averaging. Here, even with the more complex 3-link arm, FDSA was faster, so I’m going to stick with it in my LQR and iLQR implementations.

In the direct control signal optimization SPSA beat the pants off of FDSA. It was almost 5 times faster for control of the 3-link arm. This was, again, because in this case we could use noisy samples of the gradient of the loss function and relied on noise to cancel itself out as we iterated. So we only needed 2 samples of the loss function in SPSA, where in FDSA we needed 2*num_parameters. And although this generated pretty good results I would definitely be hesitant against using this for any more complicated systems, because tuning that cost function to get out a good trajectory was a pain. If you’re interested in playing around with this, you can check out the code for the gradient controllers up on my GitHub.

## The iterative Linear Quadratic Regulator algorithm

A few months ago I posted on Linear Quadratic Regulators (LQRs) for control of non-linear systems using finite-differences. The gist of it was at every time step linearize the dynamics, quadratize (it could be a word) the cost function around the current point in state space and compute your feedback gain off of that, as though the dynamics were both linear and consistent (i.e. didn’t change in different states). And that was pretty cool because you didn’t need all the equations of motion and inertia matrices etc to generate a control signal. You could just use the simulation you had, sample it a bunch to estimate the dynamics and value function, and go off of that.

The LQR, however, operates with maverick disregard for changes in the future. Careless of the consequences, it optimizes assuming the linear dynamics approximated at the current time step hold for all time. It would be really great to have an algorithm that was able to plan out and optimize a sequence, mindful of the changing dynamics of the system.

This is exactly the iterative Linear Quadratic Regulator method (iLQR) was designed for. iLQR is an extension of LQR control, and the idea here is basically to optimize a whole control sequence rather than just the control signal for the current point in time. The basic flow of the algorithm is:

1. Initialize with initial state $x_0$ and initial control sequence $\textbf{U} = [u_{t_0}, u_{t_1}, ..., u_{t_{N-1}}]$.
2. Do a forward pass, i.e. simulate the system using $(x_0, \textbf{U})$ to get the trajectory through state space, $\textbf{X}$, that results from applying the control sequence $\textbf{U}$ starting in $x_0$.
3. Do a backward pass, estimate the value function and dynamics for each $(\textbf{x}, \textbf{u})$ in the state-space and control signal trajectories.
4. Calculate an updated control signal $\hat{\textbf{U}}$ and evaluate cost of trajectory resulting from $(x_0, \hat{\textbf{U}})$.
1. If $|(\textrm{cost}(x_0, \hat{\textbf{U}}) - \textrm{cost}(x_0, \textbf{U})| < \textrm{threshold}$ then we've converged and exit.
2. If $\textrm{cost}(x_0, \hat{\textbf{U}}) < \textrm{cost}(x_0, \textbf{U})$, then set $\textbf{U} = \hat{\textbf{U}}$, and change the update size to be more aggressive. Go back to step 2.
3. If $\textrm{cost}(x_0, \hat{\textbf{U}}) \geq$ $\textrm{cost}(x_0, \textbf{U})$ change the update size to be more modest. Go back to step 3.

There are a bunch of descriptions of iLQR, and it also goes by names like ‘the sequential linear quadratic algorithm’. The paper that I’m going to be working off of is by Yuval Tassa out of Emo Todorov’s lab, called Control-limited differential dynamic programming. And the Python implementation of this can be found up on my github in my Control repo. Also, a big thank you to Dr. Emo Todorov who provided Matlab code for the iLQG algorithm, which was super helpful.

Defining things

So let’s dive in. Formally defining things, we have our system $\textbf{x}$, and dynamics described with the function $\textbf{f}$, such that

$\textbf{x}_{t+1} = \textbf{f}(\textbf{x}_t, \textbf{u}_t),$

where $\textbf{u}$ is the input control signal. The trajectory $\{\textbf{X}, \textbf{U}\}$ is the sequence of states $\textbf{X} = \{\textbf{x}_0, \textbf{x}_1, ..., \textbf{x}_N\}$ that result from applying the control sequence $\textbf{U} = \{\textbf{u}_0, \textbf{u}_1, ..., \textbf{u}_{N-1}\}$ starting in the initial state $\textbf{x}_0$.

Now we need to define all of our cost related equations, so we know exactly what we’re dealing with.

Define the total cost function $J$, which is the sum of the immediate cost, $\ell$, from each state in the trajectory plus the final cost, $\ell_f$:

$J(\textbf{x}_0, \textbf{U}) = \sum\limits^{N-1}\limits_{t=0} \ell(\textbf{x}_t, \textbf{u}_t) + \ell_f(\textbf{x}_N).$

Letting $\textbf{U}_t = \{\textbf{u}_t, \textbf{u}_{t+1}, ..., \textbf{U}_{N-1}\}$, we define the cost-to-go as the sum of costs from time $t$ to $N$:

$J_t(\textbf{x}, \textbf{U}_t) = \sum\limits^{N-1}\limits_{i=t} \ell(\textbf{x}_i, \textbf{u}_i) + \ell_f(\textbf{x}_N).$

The value function $V$ at time $t$ is the optimal cost-to-go from a given state:

$V_t(\textbf{x}) = \min\limits_{\textbf{U}_t} J_t(\textbf{x}, \textbf{U}_t),$

where the above equation just says that the optimal cost-to-go is found by using the control sequence $\textbf{U}_t$ that minimizes $J_t$.

At the final time step, $N$, the value function is simply

$V(\textbf{x}_N) = \ell_f(\textbf{x}_N).$

For all preceding time steps, we can write the value function as a function of the immediate cost $\ell(\textbf{x}, \textbf{u})$ and the value function at the next time step:

$V(\textbf{x}) = \min\limits_{\textbf{u}} \left[ \ell(\textbf{x}, \textbf{u}) + V(\textbf{f}(\textbf{x}, \textbf{u})) \right].$

NOTE: In the paper, they use the notation $V'(\textbf{f}(\textbf{x}, \textbf{u}))$ to denote the value function at the next time step, which is redundant since $\textbf{x}_{t+1} = \textbf{f}(\textbf{x}_t, \textbf{u}_t)$, but it comes in handy later when they drop the dependencies to simplify notation. So, heads up: $V' = V(\textbf{f}(\textbf{x}, \textbf{u})$.

Forward rollout

The forward rollout consists of two parts. The first part is to simulating things to generate the $(\textbf{X}, \textbf{U}),$ from which we can calculate the overall cost of the trajectory, and find out the path that the arm will take. To improve things though we’ll need a lot of information about the partial derivatives of the system, calculating these is the second part of the forward rollout phase.

To calculate all these partial derivatives we’ll use $(\textbf{X}, \textbf{U})$. For each $(\textbf{x}_t, \textbf{u}_t)$ we’ll calculate the derivatives of $\textbf{f}(\textbf{x}_t, \textbf{u}_t)$ with respect to $\textbf{x}_t$ and $\textbf{u}_t$, which will give us what we need for our linear approximation of the system dynamics.

To get the information we need about the value function, we’ll need the first and second derivatives of $\ell(\textbf{x}_t, \textbf{u}_t)$ and $\ell_f(\textbf{x}_t, \textbf{x}_t)$ with respect to $\textbf{x}_t$ and $\textbf{u}_t$.

So all in all, we need to calculate $\textbf{f}_\textbf{x}$, $\textbf{f}_\textbf{u}$, $\ell_\textbf{x}$, $\ell_\textbf{u}$, $\ell_\textbf{xx}$, $\ell_\textbf{ux}$, $\ell_\textbf{uu}$, where the subscripts denote a partial derivative, so $\ell_\textbf{x}$ is the partial derivative of $\ell$ with respect to $\textbf{x}$, $\ell_\textbf{xx}$ is the second derivative of $\ell$ with respect to $\textbf{x}$, etc. And to calculate all of these partial derivatives, we’re going to use finite differences! Just like in the LQR with finite differences post. Long story short, load up the simulation for every time step, slightly vary one of the parameters, and measure the resulting change.

Once we have all of these, we’re ready to move on to the backward pass.

Backward pass

Now, we started out with an initial trajectory, but that was just a guess. We want our algorithm to take it and then converge to a local minimum. To do this, we’re going to add some perturbing values and use them to minimize the value function. Specifically, we’re going to compute a local solution to our value function using a quadratic Taylor expansion. So let’s define $Q(\delta \textbf{x}, \delta \textbf{u})$ to be the change in our value function at $(\textbf{x}, \textbf{u})$ as a result of small perturbations $(\delta \textbf{x}, \delta \textbf{u})$:

$Q(\delta \textbf{x}, \delta \textbf{u}) = \ell (\textbf{x} + \delta \textbf{x}, \textbf{u} + \delta \textbf{u}) + V(\textbf{f}(\textbf{x} + \delta\textbf{x}, \textbf{u} + \delta \textbf{u})).$

The second-order expansion of $Q$ is given by:

$Q_\textbf{x} = \ell_\textbf{x} + \textbf{f}_\textbf{x}^T V'_\textbf{x},$

$Q_\textbf{u} = \ell_\textbf{u} + \textbf{f}_\textbf{u}^T V'_\textbf{x},$

$Q_\textbf{xx} = \ell_\textbf{xx} + \textbf{f}_\textbf{x}^T V'_\textbf{xx} \textbf{f}_\textbf{x} + V'_\textbf{x} \cdot \textbf{f}_\textbf{xx},$

$Q_\textbf{ux} = \ell_\textbf{ux} + \textbf{f}_\textbf{u}^T V'_\textbf{xx} \textbf{f}_\textbf{x}+ V'_\textbf{x} \cdot \textbf{f}_\textbf{ux},$

$Q_\textbf{uu} = \ell_\textbf{uu} + \textbf{f}_\textbf{u}^T V'_\textbf{xx} \textbf{f}_\textbf{u}+ V'_\textbf{x} \cdot \textbf{f}_\textbf{uu}.$

Remember that $V' = V(\textbf{f}(\textbf{x}, \textbf{u}))$, which is the value function at the next time step. NOTE: All of the second derivatives of $\textbf{f}$ are zero in the systems we’re controlling here, so when we calculate the second derivatives we don’t need to worry about doing any tensor math, yay!

Given the second-order expansion of $Q$, we can to compute the optimal modification to the control signal, $\delta \textbf{u}^*$. This control signal update has two parts, a feedforward term, $\textbf{k}$, and a feedback term $\textbf{K} \delta\textbf{x}$. The optimal update is the $\delta\textbf{u}$ that minimizes the cost of $Q$:

$\delta\textbf{u}^*(\delta \textbf{x}) = \min\limits_{\delta\textbf{u}}Q(\delta\textbf{x}, \delta\textbf{u}) = \textbf{k} + \textbf{K}\delta\textbf{x},$

where $\textbf{k} = -Q^{-1}_\textbf{uu} Q_\textbf{u}$ and $\textbf{K} = -Q^{-1}_\textbf{uu} Q_\textbf{ux}.$

Derivation can be found in this earlier paper by Li and Todorov. By then substituting this policy into the expansion of $Q$ we get a quadratic model of $V$. They do some mathamagics and come out with:

$V_\textbf{x} = Q_\textbf{x} - \textbf{K}^T Q_\textbf{uu} \textbf{k},$

$V_\textbf{xx} = Q_\textbf{xx} - \textbf{K}^T Q_\textbf{uu} \textbf{K}.$

So now we have all of the terms that we need, and they’re defined in terms of the values at the next time step. We know the value of the value function at the final time step $V_N = \ell_f(\textbf{x}_N)$, and so we’ll simply plug this value in and work backwards in time recursively computing the partial derivatives of $Q$ and $V$.

Calculate control signal update

Once those are all calculated, we can calculate the gain matrices, $\textbf{k}$ and $\textbf{K}$, for our control signal update. Huzzah! Now all that’s left to do is evaluate this new trajectory. So we set up our system

$\hat{\textbf{x}}_0 = \textbf{x}_0,$

$\hat{\textbf{u}}_t = \textbf{u}_t + \textbf{k}_t + \textbf{K}_t (\hat{\textbf{x}}_t - \textbf{x}_t),$

$\hat{\textbf{x}}_{t+1} = \textbf{f}(\hat{\textbf{x}}_t, \hat{\textbf{u}}_t),$

and record the cost. Now if the cost of the new trajectory $(\hat{\textbf{X}}, \hat{\textbf{U}})$ is less than the cost of $(\textbf{X}, \textbf{U})$ then we set $\textbf{U} = \hat{\textbf{U}}$ and go do it all again! And when the cost from an update becomes less than a threshold value, call it done. In code this looks like:

if costnew < cost:
sim_new_trajectory = True

if (abs(costnew - cost)/cost) < self.converge_thresh:
break


Of course, another option we need to account for is when costnew > cost. What do we do in this case? Our control update hasn’t worked, do we just exit?

The Levenberg-Marquardt heuristic
No! Phew.

The control signal update in iLQR is calculated in such a way that it can behave like Gauss-Newton optimization (which uses second-order derivative information) or like gradient descent (which only uses first-order derivative information). The is that if the updates are going well, then lets include curvature information in our update to help optimize things faster. If the updates aren’t going well let’s dial back towards gradient descent, stick to first-order derivative information and use smaller steps. This wizardry is known as the Levenberg-Marquardt heuristic. So how does it work?

Something we skimmed over in the iLQR description was that we need to calculate $Q^{-1}_\textbf{uu}$ to get the $\textbf{k}$ and $\textbf{K}$ matrices. Instead of using np.linalg.pinv or somesuch, we’re going to calculate the inverse ourselves after finding the eigenvalues and eigenvectors, so that we can regularize it. This will let us do a couple of things. First, we’ll be able to make sure that our estimate of curvature ($Q_\textbf{uu}^{-1}$) stays positive definite, which is important to make sure that we always have a descent direction. Second, we’re going to add a regularization term to the eigenvalues to prevent them from exploding when we take their inverse. Here’s our regularization implemented in Python:


Q_uu_evals, Q_uu_evecs = np.linalg.eig(Q_uu)
Q_uu_evals[Q_uu_evals < 0] = 0.0
Q_uu_evals += lamb
Q_uu_inv = np.dot(Q_uu_evecs,
np.dot(np.diag(1.0/Q_uu_evals), Q_uu_evecs.T))


Now, what happens when we change lamb? The eigenvalues represent the magnitude of each of the eigenvectors, and by taking their reciprocal we flip the contributions of the vectors. So the ones that were contributing the least now have the largest singular values, and the ones that contributed the most now have the smallest eigenvalues. By adding a regularization term we ensure that the inverted eigenvalues can never be larger than 1/lamb. So essentially we throw out information.

In the case where we’ve got a really good approximation of the system dynamics and value function, we don’t want to do this. We want to use all of the information available because it’s accurate, so make lamb small and get a more accurate inverse. In the case where we have a bad approximation of the dynamics we want to be more conservative, which means not having those large singular values. Smaller singular values give a smaller $Q_\textbf{uu}^{-1}$ estimate, which then gives smaller gain matrices and control signal update, which is what we want to do when our control signal updates are going poorly.

How do you know if they’re going poorly or not, you now surely ask! Clever as always, we’re going to use the result of the previous iteration to update lamb. So adding to the code from just above, the end of our control update loop is going to look like:

lamb = 1.0 # initial value of lambda
...
if costnew < cost:
lamb /= self.lamb_factor
sim_new_trajectory = True

if (abs(costnew - cost)/cost) < self.converge_thresh:
break
else:
lamb *= self.lamb_factor
if lamb > self.max_lamb:
break


And that is pretty much everything! OK let’s see how this runs!

Simulation results

If you want to run this and see for yourself, you can go copy my Control repo, navigate to the main directory, and run

python run.py arm2 reach


or substitute in arm3. If you’re having trouble getting the arm2 simulation to run, try arm2_python, which is a straight Python implementation of the arm dynamics, and should work no sweat for Windows and Mac.

Below you can see results from the iLQR controller controlling the 2 and 3 link arms (click on the figures to see full sized versions, they got distorted a bit in the shrinking to fit on the page), using immediate and final state cost functions defined as:

l = np.sum(u**2)


and

pos_err = np.array([self.arm.x[0] - self.target[0],
self.arm.x[1] - self.target[1]])
l = (wp * np.sum(pos_err**2) + # pos error
wv * np.sum(x[self.arm.DOF:self.arm.DOF*2]**2)) # vel error


where wp and wv are just gain values, x is the state of the system, and self.arm.x is the $(x,y)$ position of the hand. These read as “during movement, penalize large control signals, and at the final state, have a big penalty on not being at the target.”

So let’s give it up for iLQR, this is awesome! How much of a crazy improvement is that over LQR? And with all knowledge of the system through finite differences, and with the full movements in exactly 1 second! (Note: The simulation speeds look different because of my editing to keep the gif sizes small, they both take the same amount of time for each movement.)

Changing cost functions
Something that you may notice is that the control of the 3 link is actually straighter than the 2 link. I thought that this might be just an issue with the gain values, since the scale of movement is smaller for the 2 link arm than the 3 link there might have been less of a penalty for not moving in a straight line, BUT this was wrong. You can crank the gains and still get the same movement. The actual reason is that this is what the cost function specifies, if you look in the code, only $\ell_f(\textbf{x}_N)$ penalizes the distance from the target, and the cost function during movement is strictly to minimize the control signal, i.e. $\ell(\textbf{x}_t, \textbf{u}_t) = \textbf{u}_t^2$.

Well that’s a lot of talk, you say, like the incorrigible antagonist we both know you to be, prove it. Alright, fine! Here’s iLQR running with an updated cost function that includes the end-effector’s distance from the target in the immediate cost:

All that I had to do to get this was change the immediate cost from

l = np.sum(u**2)


to

l = np.sum(u**2)
pos_err = np.array([self.arm.x[0] - self.target[0],
self.arm.x[1] - self.target[1]])
l += (wp * np.sum(pos_err**2) + # pos error
wv * np.sum(x[self.arm.DOF:self.arm.DOF*2]**2)) # vel error


where all I had to do was include the position penalty term from the final state cost into the immediate state cost.

Changing sequence length
In these simulations the system is simulating at .01 time step, and I gave it 100 time steps to reach the target. What if I give it only 50 time steps?

It looks pretty much the same! It’s just now twice as fast, which is of course achieved by using larger control signals, which we don’t see, but dang awesome.

What if we try to make it there in 10 time steps??

OK well that does not look good. So what’s going on in this case? Basically we’ve given the algorithm an impossible task. It can’t make it to the target location in 10 time steps. In the implementation I wrote here, if it hits the end of it’s control sequence and it hasn’t reached the target yet, the control sequence starts over back at t=0. Remember that part of the target state is also velocity, so basically it moves for 10 time steps to try to minimize $(x,y)$ distance, and then slows down to minimize final state cost in the velocity term.

In conclusion

This algorithm has been used in a ton of things, for controlling robots and simulations, and is an important part of guided policy search, which has been used to very successfully train deep networks in control problems. It’s getting really impressive results for controlling the arm models that I’ve built here, and using finite differences should easily generalize to other systems.

iLQR is very computationally expensive, though, so that’s definitely a downside. It’s definitely less expensive if you have the equations of your system, or at least a decent approximation of them, and you don’t need to use finite differences. But you pay for the efficiency with a loss in generality.

There are also a bunch of parameters to play around with that I haven’t explored at all here, like the weights in the cost function penalizing the magnitude of the cost function and the final state position error. I showed a basic example of changing the cost function, which hopefully gets across just how easy changing these things out can be when you’re using finite differences, and there’s a lot to play around with there too.

Implementation note
In the Yuval and Todorov paper, they talked about using backtracking line search when generating the control signal. So the algorithm they had when generating the new control signal was actually:

$\hat{\textbf{u}}_t = \hat{\textbf{u}}_t + \alpha\textbf{k}_t + \textbf{K}_t(\hat{\textbf{x}}_t - \textbf{x}_t)$

where $\alpha$ was the backtracking search parameter, which gets set to one initially and then reduced. It’s very possible I didn’t implement it as intended, but I found consistently that $\alpha = 1$ always generated the best results, so it was just adding computation time. So I left it out of my implementation. If anyone has insights on an implementation that improves results, please let me know!

And then finally, another thank you to Dr. Emo Todorov for providing Matlab code for the iLQG algorithm, which was very helpful, especially for getting the Levenberg-Marquardt heuristic implemented properly.

## Linear-Quadratic Regulation for non-linear systems using finite differences

One of the standard controllers in basic control theory is the linear-quadratic regulator (LQR). There is a finite-horizon case (where you have a limited amount of time), and an infinite-horizon case (where you don’t); in this post, for simplicity, we’re only going to be dealing with the infinite-horizon case.

The LQR is designed to handle a very specific kind of problem. First, it assumes you are controlling a system with linear dynamics, which means you can express them as

$\dot{\textbf{x}} = \textbf{A}\textbf{x} + \textbf{B}\textbf{u}$,

where $\textbf{x}$ and $\dot{\textbf{x}}$ are the state and its time derivative, $\textbf{u}$ is the input, and $\textbf{A}$ and $\textbf{B}$ capture the effects of the state and input on the derivative. And second, it assumes that the cost function, denoted $J$, is a quadratic of the form

$J = \int_0^{\infty} \left( (\textbf{x} - \textbf{x}^*)^T \textbf{Q} (\textbf{x} - \textbf{x}^*) + \textbf{u}^T \textbf{R} \textbf{u} \right) dt$

where $\textbf{x}^*$ is the target state, and $\textbf{Q} = \textbf{Q}^T \geq 0$ and $\textbf{R} = \textbf{R}^T \geq 0$ are weights on the cost of not being at the target state and applying a control signal. The higher $\textbf{Q}$ is, the more important it is to get to the target state asap, the higher $\textbf{R}$ is, the more important it is to keep the control signal small as you go to the target state.

The goal of the LQR is to calculate a feedback gain matrix $\textbf{K}$ such that

$\textbf{u} = -\textbf{K} \textbf{x},$

drives the system to the target. When the system is a linear system with a quadratic cost function, this can be done optimally. There is lots of discussion elsewhere about LQRs and their derivation, so I’m not going to go into that with this post. Instead, I’m going to talk about applying LQRs to non-linear systems, and using finite differences to do it, which works when you have a readily accessible simulation of the system on hand. The fun part is that by using finite differences you can get this to work without working out the dynamics equations yourself.

Using LQRs on non-linear systems

As you may have noticed, non-linear systems violate the first assumption of a linear quadratic regulator; that the system is linear. That doesn’t mean that we can’t apply it, it just means that it’s not going to be optimal. How poorly the LQR will perform depends on a few things, two important factors being how non-linear the system dynamics actually are, and how often you’re able to update the feedback gain matrix $\textbf{K}$. To apply LQR to non-linear systems we’re just going to close our eyes and pretend that the system dynamics are linear, i.e. they fit the form

$\dot{\textbf{x}} = \textbf{A}\textbf{x} + \textbf{B}\textbf{u}.$

We’ll do this by approximating the actual dynamics of the system linearly. We’ll then solve for our gain value $\textbf{K}$, generate our control signal for this timestep, and then re-approximate the dynamics again at the next time step and solve for $\textbf{K}$ from the new state. The more non-linear the system dynamics are, the less appropriate $\textbf{K}$ will be for generating our control signal $\textbf{u}$ as we move away from the state $\textbf{K}$ was calculated in; this is why update time of the LQR can become an important factor.

Using finite-differences to approximate system dynamics

An important question, then, is how do we find this system approximation? How can we calculate the $\textbf{A}$ and $\textbf{B}$ matrices that we then use to solve for $\textbf{K}$? If we know the dynamics of the system to be

$\dot{\textbf{x}} = f(\textbf{x}, \textbf{u})$,

then we can calculate

$\textbf{A} = \frac{\partial f(\textbf{x}, \textbf{u})}{\partial \textbf{x}}, \;\;\;\; \textbf{B} = \frac{\partial f(\textbf{x}, \textbf{u})}{\partial \textbf{u}}$.

If you’re going to try this for the 3-link arm, though, get out Mathematica. Do not try this by hand. If you disregard my warning and foolhardily attempt such a derivation you will regret, repent, and then appeal to Wolfram Alpha for salvation. These equations quickly become terrible and long even for seemingly not-so-complicated systems.

There are a few ways to skirt this. Here we’re going to assume that the system under control is a simulation, or that we at least have access to an accurate model, and use the finite differences method to compute these values. The idea behind finite differences is to approximate the rate of change of the function $f$ at the point $x$ by sampling $f$ near $x$ and using the difference to calculate $\dot{f}(x)$. Here’s a picture for a 1D system:

So here, our current state $x$ is the blue dot, and the red dots represent the sample points $x + \Delta x$ and $x - \Delta x$. We can then calculate

$\dot{f}(x) \approx \frac{f(x+\Delta x) - f(x-\Delta x)}{2\Delta x},$

and you can see the actual rate of change of $f$ at $x$ plotted in the blue dashed line, and the approximated rate of change calculated using finite differences plotted in the red dashed line. We can also see that the approximated derivative is only accurate near $x$ (the blue dot).

Back in our multi-dimensional system, to use finite differences to calculate the derivative with respect to the state and the input we’re going to vary each of the dimensions of the state and input by some small amount one at a time, calculating the effects of each one by one. Here’s a chunk of pseudo-code to hopefully clarify this idea:

eps = 1e-5
A = np.zeros((len(current_state), len(current_state))
for ii in range(len(current_state)):
x = current_state.copy()
x[ii] += eps
x_inc = simulate_system(state=x, input=control_signal)
x = current_state.copy()
x[ii] -= eps
x_dec = simulate_system(state=x, input=control_signal)
A[:,ii] = (x_inc - x_dec) / (2 * eps)

B = np.zeros((len(current_state), len(control_signal))
for ii in range(len(control_signal)):
u = control_signal.copy()
u[ii] += eps
x_inc = simulate_system(state=current_state, input=u)
u = control_signal.copy()
u[ii] -= eps
x_dec = simulate_system(state=current_state, input=u)
B[:,ii] = (x_inc - x_dec) / (2 * eps)


Now we’re able to generate our $\textbf{A}$ and $\textbf{B}$ matrices we have everything we need to solve for our feedback gain matrix $\textbf{K}$! Which is great.

Note on using finite differences in continuous vs discrete setup

Something that’s important to straighten out too is what exactly is returned by the simulate_system function in the code above. In the continuous case, your system is captured as

$\dot{\textbf{x}} = \textbf{A}\textbf{x} + \textbf{B}\textbf{u},$,

where in the discrete case your system is defined

$\textbf{x}(t+1) = \textbf{A}\textbf{x}(t) + \textbf{B}\textbf{u}(t).$

If you are calculating your feedback gain matrix $\textbf{K}$ using the continuous solution to the algebraic Riccati equation, then you need to be returning $\dot{\textbf{x}}(t)$. If you’re solving for $\textbf{K}$ using the discrete solution to the algebraic Riccati equation you need to return $\textbf{x}(t+1)$. This was just something that I came across as I was coding and so I wanted to mention it here in case anyone else stumbled across it!

Applying LQR to 2 and 3 link arm control

Alright! Let’s have a look at how the LQR does controlling non-linear systems. Below we have the control of a 2-link arm compared to a 3-link arm, and you can see the control of the 2-link arm is better. This is a direct result of the dynamics of a 3-link arm being significantly more complex.

Note on controlling at different timesteps

When I was first testing the LQR controller I expected the effects of different control update times to be a lot more significant than it was. As it turns out, for controlling a 3-link arm, there’s not really a visible difference in a controller that is updating every .01 seconds vs every .001 seconds vs every .0001 seconds. Let’s have a look:

Can’t even tell, eh? Fun fact, the simulation took 1 minute 30 seconds at .01 seconds time step and 45 minutes at .0001 seconds time step. The left-most animation is the .01 seconds and the right-most the .0001 seconds. But why is there seemingly so little difference? Well, this boils down to the dynamics of the 3-link arm changing actually pretty slowly. Below I’ve plotted just a few of the elements from the $\textbf{A}$, $\textbf{B}$, and $\textbf{K}$ matrices over .5 seconds of simulation time:

So, there are some obvious points where sampling the dynamics at a .01 time step is noticeably less accurate, but all in all there’s not a huuuggge difference between sampling at .01 and .0001 seconds. If you’re just watching the end-effector path it’s really not very noticeable. You can see how the elements of $\textbf{A}$ and $\textbf{B}$ are changing fairly slowly; this means that $\textbf{K}$ is going to be an effective feedback gain for a fair chunk of time. And the computational savings you get by sampling the dynamics and regenerating $\textbf{K}$ every .01 seconds instead of every .0001 seconds are pretty big. This was just another thing that I came across when playing around with the LQR, the take away being don’t just assume you need to update your system crazy often. You might get very comparable performance for much less computational cost.

Conclusions

All in all, the LQR controller is pretty neat! It’s really simple to set up, and generic. We don’t need any specific information about the system dynamics, like we do for effective operational space control (OSC). When we estimate the dynamics with finite differences, all need is a decent system model that we can sample. Again, the more non-linear the system, of course, the less effective a LQR will be. If you’re interested in playing around with one, or generating the figures that I show above, the code is all up and running on my Github for you to explore.

## Operational space control of 6DOF robot arm with spiking cameras part 3: Tracking a target using spiking cameras

Alright. Previously we got our arm all set up to perform operational space control, accepting commands through Python. In this post we’re going to set it up with a set of spiking cameras for eyes, train it to learn the mapping between camera coordinates and end-effector coordinates, and have it track an LED target.

What is a spiking camera?

Good question! Spiking cameras are awesome, and they come from Dr. Jorg Conradt’s lab. Basically what they do is return you information about movement from the environment. They’re event-driven, instead of clock-driven like most hardware, which means that they have no internal clock that’s dictating when they send information (i.e. they’re asynchronous). They send information out as soon as they receive it. Additionally, they only send out information about the part of the image that has changed. This means that they have super fast response times and their output bandwidth is really low. Dr. Terry Stewart of our lab has written a bunch of code that can be used for interfacing with spiking cameras, which can all be found up on his GitHub.

Let’s use his code to see through a spiking camera’s eye. After cloning his repo and running python setup.py you can plug in a spiking camera through USB, and with the following code have a Matplotlib figure pop-up with the camera output:

import nstbot
import nstbot.connection
import time

eye = nstbot.RetinaBot()
eye.connect(nstbot.connection.Serial('/dev/ttyUSB0', baud=4000000))

time.sleep(1)

eye.retina(True)
eye.show_image()

while True:
time.sleep(1)


The important parts here are the creation of an instance of the RetinaBot, connecting it to the proper USB port, and calling the show_image function. Pretty easy, right? Here’s some example output, this is me waving my hand and snapping my fingers:

How cool is that? Now, you may be wondering how or why we’re going to use a spiking camera instead of a regular camera. The main reason that I’m using it here is because it makes tracking targets super easy. We just set up an LED that blinks at say 100Hz, and then we look for that frequency in the spiking camera output by recording the rate of change of each of the pixels and averaging over all pixel locations changing at the target frequency. So, to do this with the above code we simply add

eye.track_frequencies(freqs=[100])


And now we can track the location of an LED blinking at 100Hz! The visualization code place a blue dot at the estimated target location, and this all looks like:

Alright! Easily decoded target location complete.

Transforming between camera coordinates and end-effector coordinates

Now that we have a system that can track a target location, we need to transform that position information into end-effector coordinates for the arm to move to. There are a few ways to go about this. One is by very carefully positioning the camera and measuring the distances between the robot’s origin reference frame and working through the trig etc etc. Another, much less pain-in-the-neck way is to instead record some sample points of the robot end-effector at different positions in both end-effector and camera coordinates, and then use a function approximator to generalize over the rest of space.

We’ll do the latter, because it’s exactly the kind of thing that neurons are great for. We have some weird function, and we want to learn to approximate it. Populations of neurons are awesome function approximators. Think of all the crazy mappings your brain learns. To perform function approximation with neurons we’re going to use the Neural Engineering Framework (NEF). If you’re not familiar with the NEF, the basic idea is to using the response curves of neurons as a big set of basis function to decode some signal in some vector space. So we look at the responses of the neurons in the population as we vary our input signal, and then determine a set of decoders (using least-squares or somesuch) that specify the contribution of each neuron to the different dimensions of the function we want to approximate.

Here’s how this is going to work.

1. We’re going to attach the LED to the head of the robot,
2. we specify a set of $(x,y,z)$ coordinates that we send to the robot’s controller,
3. when the robot moves to each point, record the LED location from the camera as well as the end-effector’s $(x,y,z)$ coordinate,
4. create a population of neurons that we train up to learn the mapping from camera locations to end-effector $(x,y,z)$ locations
5. use this information to tell the robot where to move.

A detail that should be mentioned here is that a single camera only provides 2D output. To get a 3D location we’re going to use two separate cameras. One will provide $(x,z)$ information, and the other will provide $(y,z)$ information.

Once we’ve taped (expertly) the LED onto the robot arm, the following script to generate the information we to approximate the function transforming from camera to end-effector space:

import robot
from eye import Eye # this is just a spiking camera wrapper class

import numpy as np
import time

# connect to the spiking cameras
eye0 = Eye(port='/dev/ttyUSB2')
eye1 = Eye(port='/dev/ttyUSB1')
eyes = [eye0, eye1]
# connect to the robot
rob = robot.robotArm()

# define the range of values to test
min_x = -10.0
max_x = 10.0
x_interval = 5.0
min_y = -15.0
max_y = -5.0
y_interval = 5.0
min_z = 10.0
max_z = 20.0
z_interval = 5.0

x_space = np.arange(min_x, max_x, x_interval)
y_space = np.arange(min_y, max_y, y_interval)
z_space = np.arange(min_z, max_z, z_interval)

num_samples = 10 # how many camera samples to average over

try:
out_file0 = open('eye_map_0.csv', 'w')
out_file1 = open('eye_map_1.csv', 'w')

for i, x_val in enumerate(x_space):
for j, y_val in enumerate(y_space):
for k, z_val in enumerate(z_space):

rob.move_to_xyz(target)
time.sleep(2) # time for the robot to move

# take a bunch of samples and average the input to get
# the approximation of the LED in camera coordinates
eye_data0 = np.zeros(2)
for k in range(num_samples):
eye_data0 += eye0.position(0)[:2]
eye_data0 /= num_samples
out_file0.write('%0.2f, %0.2f, %0.2f, %0.2f\n' %
(y_val, z_val, eye_data0[0], eye_data0[1]))

eye_data1 = np.zeros(2)
for k in range(num_samples):
eye_data1 += eye1.position(0)[:2]
eye_data1 /= num_samples
out_file1.write('%0.2f, %0.2f, %0.2f, %0.2f\n' %
(x_val, z_val, eye_data1[0], eye_data1[1]))

out_file0.close()
out_file1.close()
except:
import sys
import traceback
print traceback.print_exc(file=sys.stdout)
finally:
rob.robot.disconnect()


This script connects to the cameras, defines some rectangle in end-effector space to sample, and then works through each of the points writing the data to file. The results of this code can be seen in the animation posted in part 2 of this series.

OK! So now we have all the information we need to train up our neural population. It’s worth noting that we’re only using 36 sample points to train up our neurons, I did this mostly because I didn’t want to wait around. You can of course use more, though, and the more sample points you have the more accurate your function approximation will be.

Implementing a controller using Nengo

The neural simulation software (which implements the NEF) that we’re going to be using to generate and train our neural population is called Nengo. It’s free to use for non-commercial use, and I highly recommend checking out the introduction and tutorials if you have any interest in neural modeling.

What we need to do now is generate two neural populations, one for each camera, that will receives input from the spiking camera and transform the target’s location information into end-effector coordinates. We will then combine the estimates from the two populations, and send that information out to the robot to tell it where to move. I’ll paste the code in here, and then we’ll step through it below.

from eye import Eye
import nengo
from nengo.utils.connection import target_function
import robot

import numpy as np
import sys
import traceback

# connect to robot
rob = robot.robotArm()

model = nengo.Network()

try:
def eyeNet(port='/dev/ttyUSB0', filename='eye_map.csv', n_neurons=1000,
label='eye'):

# connect to eye
spiking_cam = Eye(port=port)

# read in eval points and target output
eval_points = []
targets = []

file_obj = open(filename, 'r')
for line in file_data:
line_data = map(float, line.strip().split(','))
targets.append(line_data[:2])
eval_points.append(line_data[2:])
file_obj.close()

eval_points = np.array(eval_points)
targets = np.array(targets)

# create subnetwork for eye
net = nengo.Network(label=label)
with net:
def eye_input(t):
return spiking_cam.position(0)[:2]
net.input = nengo.Node(output=eye_input, size_out=2)
net.map_ens = nengo.Ensemble(n_neurons, dimensions=2)
net.output = nengo.Node(size_in=2)

nengo.Connection(net.input, net.map_ens, synapse=None)
nengo.Connection(net.map_ens, net.output, synapse=None,
**target_function(eval_points, targets))

return net

with model:
# create network for spiking camera 0
eye0 = eyeNet(port='/dev/ttyUSB2', filename='eye_map_0.csv', label='eye0')
# create network for spiking camera 1
eye1 = eyeNet(port='/dev/ttyUSB1', filename='eye_map_1.csv', label='eye1')

def eyes_func(t, yzxz):
x = yzxz[2] # x coordinate coded from eye1
y = yzxz[0] # y coordinate coded from eye0
z = (yzxz[1] + yzxz[3]) / 2.0 # z coordinate average from eye0 and eye1
return [x,y,z]
eyes = nengo.Node(output=eyes_func, size_in=4)
nengo.Connection(eye0.output, eyes[:2])
nengo.Connection(eye1.output, eyes[2:])

# create output node for sending instructions to arm
def arm_func(t, x):
if t < .05: return # don't move arm during startup (avoid transients)
rob.move_to_xyz(np.array(x, dtype='float32'))
armNode = nengo.Node(output=arm_func, size_in=3, size_out=0)
nengo.Connection(eyes, armNode)

sim = nengo.Simulator(model)
sim.run(10, progress_bar=False)

except:
print traceback.print_exc(file=sys.stdout)
finally:
print 'disconnecting'
rob.robot.disconnect()


The first thing we’re doing is defining a function (eyeNet) to create our neural population that takes input from a spiking camera, and decodes out an end-effector location. In here, we read in from the file the information we just recorded about the camera positions that will serve as the input signal to the neurons (eval_points) and the corresponding set of function output (targets). We create a Nengo network, net, and then a couple of nodes for connecting the input (net.input) and projecting the output (net.output). The population of neurons that we’ll use to approximate our function is called net.map_ens. To specify the function we want to approximate using the eval_points and targets arrays, we create a connection from net.map_ens to net.output and use **target_function(eval_points, targets). So this is probably a little weird to parse if you haven’t used Nengo before, but hopefully it’s clear enough that you can get the gist of what’s going on.

In the main part of the code, we create another Nengo network. We call this one model because that’s convention for the top-level network in Nengo. We then create two networks using the eyeNet function to hook up to the two cameras. At this point we create a node called eyes, and the role of this node is simply to amalgamate the information from the two cameras from $(x,z)$ and $(y,z)$ into $(x,y,z)$. This node is then hooked up to another node called armNode, and all armNode does is call the robot arm’s move_to_xyz function, which we defined in the last post.

Finally, we create a Simulation from model, which compiles the neural network we just specified above, and we run the simulation. The result of all of this then looks something like the following:

And there we go! Project complete! We have a controller for a 6DOF arm that uses spiking cameras to train up a neural population and track an LED, that requires almost no set up time. I gave a demo of this at the end of the summer school and there’s no real positioning of the cameras relative to the arm required, just have to tape the cameras up somewhere, run the training script, and go!

Future work

From here there are a bunch of fun ways to go about extending this. We could add another LED blinking at a different frequency that the arm needs to avoid, using an obstacle avoidance algorithm like the one in this post, add in another dimension of the task involving the gripper, implement a null-space controller to keep the arm near resting joint angles as it tracks the target, and on and on!

Another thing that I’ve looked at is including learning on the system to fine tune our function approximation online. As is, the controller is able to extrapolate and move the arm to target locations that are outside of the range of space sampled during training, but it’s not super accurate. It would be much better to be constantly refining the estimate using learning. I was able to implement a basic version that works, but getting the learning and the tracking running at the same time turns out to be a bit trickier, so I haven’t had the chance to get it all running yet. Hopefully there will be some more down-time in the near future, however, and be able to finish implementing it.

For now, though, we still have a pretty neat target tracker for our robot arm!

## Operational space control of 6DOF robot arm with spiking cameras part 2: Deriving the Jacobian

In the previous exciting post in this series I outlined the project, which is in the title, and we worked through getting access to the arm through Python. The next step was deriving the Jacobian, and that’s what we’re going to be talking about in this post!

Alright.
This was a time I was very glad to have a previous post talking about generating transformation matrices, because deriving the Jacobian for a 6DOF arm in 3D space comes off as a little daunting when you’re used to 3DOF in 2D space, and I needed a reminder of the derivation process. The first step here was finding out which motors were what, so I went through and found out how each motor moved with something like the following code:

for ii in range(7):
target_angles = np.zeros(7, dtype='float32')
target_angles[ii] = np.pi / 4.0
rob.move(target_angles)
time.sleep(1)


and I found that the robot is setup in the figures below

this is me trying my hand at making things clearer using Inkscape, hopefully it’s worked. Displayed are the first 6 joints and their angles of rotation, $q_0$ through $q_5$. The 7th joint, $q_6$, opens and closes the gripper, so we’re safe to ignore it in deriving our Jacobian. The arm segment lengths $l_1, l_3,$ and $l_5$ are named based on the nearest joint angles (makes easier reading in the Jacobian derivation).

Find the transformation matrix from end-effector to origin

So first thing’s first, let’s find the transformation matrices. Our first joint, $q_0$, rotates around the $z$ axis, so the rotational part of our transformation matrix $^0_\textrm{org}\textbf{T}$ is

$^0_\textrm{org}\textbf{R} = \left[ \begin{array}{ccc} \textrm{cos}(q_0) & -\textrm{sin}(q_0) & 0 \\ \textrm{sin}(q_0) & \textrm{cos}(q_0) & 0 \\ 0 & 0 & 1 \end{array} \right],$

and $q_0$ and our origin frame of reference are on top of each other so we don’t need to account for translation, so our translation component of $^0_\textrm{org}\textbf{T}$ is

$^0_\textrm{org}\textbf{D} = \left[ \begin{array}{c} 0 \\ 0 \\ 0 \end{array} \right].$

Stacking these together to form our first transformation matrix we have

$^0_\textrm{org}\textbf{T} = \left[ \begin{array}{cc} ^0_\textrm{org}\textbf{R} & ^0_\textrm{org}\textbf{D} \\ 0 & 1 \end{array} \right] = \left[ \begin{array}{cccc} \textrm{cos}(q_0) & -\textrm{sin}(q_0) & 0 & 0\\ \textrm{sin}(q_0) & \textrm{cos}(q_0) & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array} \right] .$

So now we are able to convert a position in 3D space from to the reference frame of joint $q_0$ back to our origin frame of reference. Let’s keep going.

Joint $q_1$ rotates around the $x$ axis, and there is a translation along the arm segment $l_1$. Our transformation matrix looks like

$^1_0\textbf{T} = \left[ \begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & \textrm{cos}(q_1) & -\textrm{sin}(q_1) & l_1 \textrm{cos}(q_1) \\ 0 & \textrm{sin}(q_1) & \textrm{cos}(q_1) & l_1 \textrm{sin}(q_1) \\ 0 & 0 & 0 & 1 \end{array} \right] .$

Joint $q_2$ also rotates around the $x$ axis, but there is no translation from $q_2$ to $q_3$. So our transformation matrix looks like

$^2_1\textbf{T} = \left[ \begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & \textrm{cos}(q_2) & -\textrm{sin}(q_2) & 0 \\ 0 & \textrm{sin}(q_2) & \textrm{cos}(q_2) & 0 \\ 0 & 0 & 0 & 1 \end{array} \right] .$

The next transformation matrix is a little tricky, because you might be tempted to say that it’s rotating around the $z$ axis, but actually it’s rotating around the $y$ axis. This is determined by where $q_3$ is mounted relative to $q_2$. If it was mounted at 90 degrees from $q_2$ then it would be rotating around the $z$ axis, but it’s not. For translation, there’s a translation along the $y$ axis up to the next joint, so all in all the transformation matrix looks like:

$^3_2\textbf{T} = \left[ \begin{array}{cccc} \textrm{cos}(q_3) & 0 & -\textrm{sin}(q_3) & 0\\ 0 & 1 & 0 & l_3 \\ \textrm{sin}(q_3) & 0 & \textrm{cos}(q_3) & 0\\ 0 & 0 & 0 & 1 \end{array} \right] .$

And then the transformation matrices for coming from $q_4$ to $q_3$ and $q_5$ to $q_4$ are the same as the previous set, so we have

$^4_3\textbf{T} = \left[ \begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & \textrm{cos}(q_4) & -\textrm{sin}(q_4) & 0 \\ 0 & \textrm{sin}(q_4) & \textrm{cos}(q_4) & 0 \\ 0 & 0 & 0 & 1 \end{array} \right] .$

and

$^5_4\textbf{T} = \left[ \begin{array}{cccc} \textrm{cos}(q_5) & 0 & -\textrm{sin}(q_5) & 0 \\ 0 & 1 & 0 & l_5 \\ \textrm{sin}(q_5) & 0 & \textrm{cos}(q_5) & 0\\ 0 & 0 & 0 & 1 \end{array} \right] .$

Alright! Now that we have all of the transformation matrices, we can put them together to get the transformation from end-effector coordinates to our reference frame coordinates!

$^\textrm{ee}_\textrm{org}\textbf{T} = ^0_\textrm{org}\textbf{T} \; ^1_0\textbf{T} \; ^2_1\textbf{T} \; ^3_2\textbf{T} \; ^4_3\textbf{T} \; ^5_4\textbf{T}.$

At this point I went and tested this with some sample points to make sure that everything seemed to be being transformed properly, but we won’t go through that here.

Calculate the derivative of the transform with respect to each joint

The next step in calculating the Jacobian is getting the derivative of $^\textrm{ee}_\textrm{org}T$. This could be a big ol’ headache to do it by hand, OR we could use SymPy, the symbolic computation package for Python. Which is exactly what we’ll do. So after a quick

sudo pip install sympy


I wrote up the following script to perform the derivation for us

import sympy as sp

def calc_transform():
# set up our joint angle symbols (6th angle doesn't affect any kinematics)
q = [sp.Symbol('q0'), sp.Symbol('q1'), sp.Symbol('q2'), sp.Symbol('q3'),
sp.Symbol('q4'), sp.Symbol('q5')]
# set up our arm segment length symbols
l1 = sp.Symbol('l1')
l3 = sp.Symbol('l3')
l5 = sp.Symbol('l5')

Torg0 = sp.Matrix([[sp.cos(q[0]), -sp.sin(q[0]), 0, 0,],
[sp.sin(q[0]), sp.cos(q[0]), 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]])

T01 = sp.Matrix([[1, 0, 0, 0],
[0, sp.cos(q[1]), -sp.sin(q[1]), l1*sp.cos(q[1])],
[0, sp.sin(q[1]), sp.cos(q[1]), l1*sp.sin(q[1])],
[0, 0, 0, 1]])

T12 = sp.Matrix([[1, 0, 0, 0],
[0, sp.cos(q[2]), -sp.sin(q[2]), 0],
[0, sp.sin(q[2]), sp.cos(q[2]), 0],
[0, 0, 0, 1]])

T23 = sp.Matrix([[sp.cos(q[3]), 0, sp.sin(q[3]), 0],
[0, 1, 0, l3],
[-sp.sin(q[3]), 0, sp.cos(q[3]), 0],
[0, 0, 0, 1]])

T34 = sp.Matrix([[1, 0, 0, 0],
[0, sp.cos(q[4]), -sp.sin(q[4]), 0],
[0, sp.sin(q[4]), sp.cos(q[4]), 0],
[0, 0, 0, 1]])

T45 = sp.Matrix([[sp.cos(q[5]), 0, sp.sin(q[5]), 0],
[0, 1, 0, l5],
[-sp.sin(q[5]), 0, sp.cos(q[5]), 0],
[0, 0, 0, 1]])

T = Torg0 * T01 * T12 * T23 * T34 * T45

# position of the end-effector relative to joint axes 6 (right at the origin)
x = sp.Matrix([0,0,0,1])

Tx = T * x

for ii in range(6):
print q[ii]
print sp.simplify(Tx[0].diff(q[ii]))
print sp.simplify(Tx[1].diff(q[ii]))
print sp.simplify(Tx[2].diff(q[ii]))


And then consolidated the output using some variable shorthand to write a function that accepts in joint angles and generates the Jacobian:

def calc_jacobian(q):
J = np.zeros((3, 7))

c0 = np.cos(q[0])
s0 = np.sin(q[0])
c1 = np.cos(q[1])
s1 = np.sin(q[1])
c3 = np.cos(q[3])
s3 = np.sin(q[3])
c4 = np.cos(q[4])
s4 = np.sin(q[4])

c12 = np.cos(q[1] + q[2])
s12 = np.sin(q[1] + q[2])

l1 = self.l1
l3 = self.l3
l5 = self.l5

J[0,0] = -l1*c0*c1 - l3*c0*c12 - l5*((s0*s3 - s12*c0*c3)*s4 + c0*c4*c12)
J[1,0] = -l1*s0*c1 - l3*s0*c12 + l5*((s0*s12*c3 + s3*c0)*s4 - s0*c4*c12)
J[2,0] = 0

J[0,1] = (l1*s1 + l3*s12 + l5*(s4*c3*c12 + s12*c4))*s0
J[1,1] = -(l1*s1 + l3*s12 + l5*s4*c3*c12 + l5*s12*c4)*c0
J[2,1] = l1*c1 + l3*c12 - l5*(s4*s12*c3 - c4*c12)

J[0,2] = (l3*s12 + l5*(s4*c3*c12 + s12*c4))*s0
J[1,2] = -(l3*s12 + l5*s4*c3*c12 + l5*s12*c4)*c0
J[2,2] = l3*c12 - l5*(s4*s12*c3 - c4*c12)

J[0,3] = -l5*(s0*s3*s12 - c0*c3)*s4
J[1,3] = l5*(s0*c3 + s3*s12*c0)*s4
J[2,3] = -l5*s3*s4*c12

J[0,4] = l5*((s0*s12*c3 + s3*c0)*c4 + s0*s4*c12)
J[1,4] = l5*((s0*s3 - s12*c0*c3)*c4 - s4*c0*c12)
J[2,4] = -l5*(s4*s12 - c3*c4*c12)

return J


Alright! Now we have our Jacobian! Really the only time consuming part here was calculating our end-effector to origin transformation matrix, generating the Jacobian was super easy using SymPy once we had that.

Hack position control using the Jacobian

Great! So now that we have our Jacobian we’ll be able to translate forces that we want to apply to the end-effector into joint torques that we want to apply to the arm motors. Since we can’t control applied force to the motors though, and have to pass in desired angle positions, we’re going to do a hack approximation. Let’s first transform our forces from end-effector space into a set of joint angle torques:

$\textbf{u} = \textbf{J}^T \; \textbf{u}_\textbf{x}.$

To approximate the control then we’re simply going to take the current set of joint angles (which we know because it’s whatever angles we last told the system to move to) and add a scaled down version of $\textbf{u}$ to approximate applying torque that affects acceleration and then velocity.

$\textbf{q}_\textrm{des} = \textbf{q} + \alpha \; \textbf{u},$

where $\alpha$ is the gain term, I used .001 here because it was nice and slow, so no crazy commands that could break the servos would be sent out before I could react and hit the cancel button.

What we want to do then to implement operational space control here then is find the current $(x,y,z)$ position of the end-effector, calculate the difference between it and the target end-effector position, use that to generate the end-effector control signal $u_x$, get the Jacobian for the current state of the arm using the function above, find the set of joint torques to apply, approximate this control by generating a set of target joint angles to move to, and then repeat this whole loop until we’re within some threshold of the target position. Whew.

So, a lot of steps, but pretty straight forward to implement. The method I wrote to do it looks something like:

def move_to_xyz(self, xyz_d):
"""
np.array xyz_d: 3D target (x_d, y_d, z_d)
"""
count = 0
while (1):
count += 1
# get control signal in 3D space
xyz = self.calc_xyz()
delta_xyz = xyz_d - xyz
ux = self.kp * delta_xyz

# transform to joint space
J = self.calc_jacobian()
u = np.dot(J.T, ux)

# target joint angles are current + uq (scaled)
self.q[...] += u * .001
self.robot.move(np.asarray(self.q.copy(), 'float32'))

if np.sqrt(np.sum(delta_xyz**2)) < .1 or count > 1e4:
break


And that is it! We have successfully hacked together a system that can perform operational space control of a 6DOF robot arm. Here is a very choppy video of it moving around to some target points in a grid on a cube.

So, granted I had to drop a lot of frames from the video to bring it’s size down to something close to reasonable, but still you can see that it moves to target locations super fast!

Alright this is sweet, but we’re not done yet. We don’t want to have to tell the arm where to move ourselves. Instead we’d like the robot to perform target tracking for some target LED we’re moving around, because that’s way more fun and interactive. To do this, we’re going to use spiking cameras! So stay tuned, we’ll talk about what the hell spiking cameras are and how to use them for a super quick-to-setup and foolproof target tracking system in the next exciting post!

## Operational space control of 6DOF robot arm with spiking cameras part 1: Getting access to the arm through Python

From June 9th to the 19th we ran a summer school (brain camp) in our lab with people from all over to come and learn how to use our neural modeling software, Nengo, and then work on a project with others in the school and people from the lab. We had a bunch of fun hardware available, from neuromorphic hardware from the SpiNNaker group over at Cambridge to lego robots on omni-wheels to spiking cameras (i’ll discuss what a spiking camera is exactly in part 3 of this series) and little robot arms. There were a bunch of awesome projects, people built neural models capable of performing a simplified version of the Wisconsin card sorting task that they then got running on the SpiNNaker boards, a neural controller built to move a robot leech, a spiking neurons reinforcement-learning system implemented on SpiNNaker with spiking cameras to control the lego robot that learned to move towards green LEDs and avoid red LEDs, and a bunch of others. If you’re interested in participating in these kinds of projects and learning more about this I highly suggest you apply to the summer school for next year!

I took the summer school as a chance to break from other projects and hack together a project. The idea was to take the robot arm, implement an operational space controller (i.e. find the Jacobian), and then used spiking cameras to track an LED and have the robot arm learn how to move to the target, no matter where the cameras were placed, by learning the relationship between where the target is in camera-centric coordinates and arm-centric coordinates. This broke down into several steps: 1) Get access to the arm through Python, 2) derive the Jacobian to implement operational space control, 3) sample state space to get an approximation of the camera-centric to arm-centric function, 4) implement the control system to track the target LED.

Here’s a picture of the set-up during step 3, training.

So in the center is the 6DOF robot arm with a little LED attached to the head, and highlighted in orange circles are the two spiking cameras, expertly taped to the wall with office-grade scotch tape. You can also see the SpiNNaker board in the top left as a bonus, but I didn’t have enough time to involve it in this project.

I was originally going to write this all up as one post, because the first two parts are re-implementing previous posts, but even skimming through those steps it was getting long and I’m sure no one minds having a few different examples to look through for generating Cython code or deriving a Jacobian. So I’m going to break this into a few parts. Here in this post we’ll work through the first step (Cython) of our journey.

The arm we had was the VE026A Denso training robot, on loan from Dr. Bryan Tripp of the neuromorphic robotics lab at UW. Previously an interface had been built up by one of Dr. Tripp’s summer students, written in C. C is great and all but Python is much easier to work with, and the rest of the software I wanted to work you know what I’m done justifying it Python is just great so Python is what I wanted to use. The end.

So to get access to the arm in Python I used the great ol’ C++ wrapper hack described in a previous post. Looking at Murphy-the-summer-student’s C code there were basically three functions I needed access to:

// initialize threads, connect to robot
void start_robot(int *semid, int32_t *ctrlid, int32_t *robotid, float **angles)
// send a set of joint angles to the robot servos
void Robot_Execute_slvMove(int32_t robotid, float j1, float j2, float j3, float j4, float j5, float j6, float j7, float j8)
// kill threads, disconnect from robot
void stop_robot(int semid, int32_t ctrlid, int32_t robotid)


So I changed the extension of the file to ‘.cpp’ (I know, I know, I said this was a hack!), fixed some compiler errors that popped up, and then appended the following to the end of the file:

/* A class to contain all the information that needs to
be passed around between these functions, and can
encapsulate it and hide it from the Python interface.

Written by Travis DeWolf (June, 2015)
*/

class Robot {
/* Very simple class, just stores the variables
* needed for talking to the robot, and a gives access
* to the functions for moving it around */

int semid;
int32_t ctrlid;
int32_t robotid;
float* angles;

public:
Robot();
~Robot();
void connect();
void move(float* angles_d);
void disconnect();
};

Robot::Robot() { }

Robot::~Robot() { free(angles); }

/* Connect to the robot, get the ids and current joint angles */
/* char* usb_port: the name of the port the robot is connected to */
void Robot::connect()
{
start_robot(&amp;semid, &amp;ctrlid, &amp;robotid, &amp;angles);
printf("%i %i %i", robotid, ctrlid, semid);
}

/* Move the robot to the specified angles */
/* float* angles: the target joint angles */
void Robot::move(float* angles)
{
// convert from radians to degrees
Robot_Execute_slvMove(robotid,
angles[0] * 180.0 / 3.14,
angles[1] * 180.0 / 3.14,
angles[2] * 180.0 / 3.14,
angles[3] * 180.0 / 3.14,
angles[4] * 180.0 / 3.14,
angles[5] * 180.0 / 3.14,
angles[6] * 180.0 / 3.14,
angles[7] * 180.0 / 3.14);
}

/* Disconnect from the robot */
void Robot::disconnect()
{
stop_robot(semid, ctrlid, robotid);
}

int main()
{
Robot robot = Robot();
// connect to robot
robot.connect();

// move robot to some random target angles
float target_angles[7] = {0, np.pi / 2.0, 0.0, 0, 0, 0, 0};
robot.move(target_angles);

sleep(1);

// disconnect
robot.disconnect();

return 0;
}


So very simple class. Basically just wanted to create a set of functions to hide some of the unnecessary parameters from Python, do the conversion from radians to degrees (who works in degrees? honestly), and have a short little main function to test the creation of the class, and connection, movement, and disconnection of the robot. Like I said, there were a few compiler errors when switching from C to C++, but really that was the only thing that took any time on this part. A few casts and everything was gravy.

The next part was writing the Cython pyRobot.pyx file (I describe the steps involved in this in more detail in this post):

import numpy as np
cimport numpy as np

cdef extern from "bcap.cpp":
cdef cppclass Robot:
Robot()
void connect()
void move(float* angles)
void disconnect()

cdef class pyRobot:
cdef Robot* thisptr

def __cinit__(self):
self.thisptr = new Robot()

def __dealloc__(self):
del self.thisptr

def connect(self):
"""
Set up the connection to the robot, pass in a vector,
get back the current joint angles of the arm.
param np.ndarray angles: a vector to store current joint angles
"""
self.thisptr.connect()

def move(self, np.ndarray[float, mode="c"] angles):
"""
Step the simulation forward one timestep. Pass in target angles,
get back the current arm joint angles.
param np.ndarray angles: 7D target angle vector
"""
self.thisptr.move(&amp;angles[0])

def disconnect(self):
"""
Disconnect from the robot.
"""
self.thisptr.disconnect()


the setup.py file:

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

setup(
name = 'Demos',
ext_modules=[
Extension("test",
sources=["pyRobot.pyx"],
language="c++"),
],
cmdclass = {'build_ext': build_ext},

)


and then compiling!

run setup.py build_ext -i


With all of this working, a nice test.so file was created, and it was now possible to connect to the robot in Python with

import test
rob = test.pyRobot()
rob.connect()
target_angles = np.array([0, np.pi/2, 0, np.pi/4, 0, 0, 0, 0], dtype='float32')
rob.move(target_angles)
import time
time.sleep(1)
rob.disconnect()


In the above code we’re instantiating the pyRobot class, connecting to the robot, defining a set of target angles and telling the robot to move there, waiting for 1 second to give the robot time to actually move, and then disconnecting from the robot. Upon connection we have to pass in a set of joint angles for the servos, and so you see the robot arm jerk into position, and then move to the target set of joint angles, it looks something exactly like this:

Neat, phase 1 complete.

At the end of phase 1 we are able to connect to the robot arm through Python and send commands in terms of joint angles. But we don’t want to send commands in terms of joint angles, we want to just specify the end-effector position and have the robot work out the angles! I’ve implemented an inverse kinematics solver using constrained optimization before for a 3-link planar arm, but we’re not going to go that route. Find out what we’ll do by joining us next time! or by remembering what I said we’d do at the beginning of this post.

Tagged , , ,

## Workshop talk – Methods for scaling neural computation

A couple of months ago I gave a talk at the Neuro-Inspired Computational Elements (NICE) workshop, about the use of cortical microcircuits for adaptation in the prediction and control problems. The talk was recorded, and here is a link: http://1.usa.gov/1tg2n2I Ignore the fire alarm that gets pulled in the first minute, the actual talk starts around 1 minute.

The were a lot of talks in general that were very interesting, which are also available online here: http://digitalops.sandia.gov/Mediasite/Play/86e1cbfa747a448d96909658df1352011d

I’ll be writing up some posts on the subject matter of my talk in the next couple months, explaining the methods in more detail and providing some solid examples with code. Hope you find the video interesting!

## Dynamic movement primitives part 3: Rhythmic movements

So far we’ve looked at using DMPs for discrete movements, but as I briefly mentioned it’s also possible to use them to perform rhythmic movements. In this post we’re going to look at the implementation of rhythmic DMPs and see exactly how it’s done. It’s actually pretty straightforward, which is always nich. We’ll go through everything that has to change and then see a couple of different applications of rhythmic DMPs. Again, all the code for the DMPs and arm control implementations can be found up on my Github.

Obtaining consistent and repeatable basis function activation

We’ll start with the canonical system. In the discrete case our canonical system (which drives the activation of the basis functions) decayed from $1$ to $0$ and then we were all done. In the rhythmic case we want our system to repeat indefinitely, so we need a reliable way of continuously activating the basis functions in the same order. One function that may come to mind that reliably repeats is the cosine function. But how to use the cosine function, exactly?

The idea is that we’ll lay out our basis functions in the range from $0$ to $2\pi$, and we’ll set the canonical system to be ever increasing linearly. Then we’ll use the difference between the canonical system state and each of the center points as the value we pass in to our cosine function. Because the cosine function repeats at $2\pi$ we’ll get a nice repeating spread of basis function activations. We’ll then throw these values through our Gaussian equation (with the gain and bias terms) and we’ll be done! Here’s the equation:

$\psi = \textrm{exp}(h * \textrm{cos}(x - c) - 1),$
where $x$ is the state of the canonical system, $c$ is the basis function center point, and $h$ is a gain term controlling the variance of the Gaussian. And here’s a picture of the activations of three basis functions with centers spread out evenly between $0$ and $2\pi$:

Here’s a picture of the basis function activations on a longer time scale:

As long as our canonical system increases at a steady pace the basis functions will continue to activate in a reliable and repeatable way. As mentioned, to get the canonical system to increase at a steady pace we simply give it linear dynamics:

$\dot{x} = 1$
The placement of the center points of the rhythmic system is a lot easier than in the discrete case because the rhythmic canonical system dynamics are linear.

Other differences between discrete and rhythmic

In the actual implementation of the rhythmic system there are a couple of other differences that you need to be aware of. The first is that there is no diminishing term in the rhythmic system, whereas there is in the discrete case.

The second is how to go about establishing goal states when imitating an path. We’re going to assume that whatever path is passed in to us is going to be a rhythmic pattern, and then goal state is going to be set as the center point of the desired trajectory. So for rhythmic systems the ‘goal’ is actually more like a center point for the trajectory.

And that’s pretty much it! If you look through the code that I have up for the rhythmic DMP subclass you’ll see that only a handful of functions needed to be redefined, the rest is the same! Well that’s great, let’s look at some applications.

Example

I was thinking of different examples that would be interesting to see of the rhythmic DMPs, and after a little consideration the most obvious application of rhythmic DMPs is really to a system that does something like walking. In the spinal cord of animals there are circuits which are known as central pattern generators (CPGs), and when stimulated these have been shown to generate rhythmic, repeated movements (as described in Principles of rhythmic motor pattern generation and other papers). Let’s go ahead and build our own CPG here using DMPs. I don’t have a simulation of a set of legs, however, so instead we’ll look at getting a single 3-link arm to reproduce a pattern of behaviour that you’d expect in a leg that is walking.

To do this, I’m going to need to specify the trajectories for the three joints of the arm, and I’ll be controlling the system in joint space, using the generalized coordinates controller, rather than the operational space controller used in the discrete DMP examples for controlling the end-effector position. So that’s one thing. Another thing is that I’m going to need the kinematics of leg joints during motion. After a quick search of the web I found this presentation from Simon Fraser University which has the following images:

Which is some very useful information! I just need to transfer this into a Numpy vector that I can feed into the system and convert degrees to radians and we should be good to go! To get the information out of the graph, use your favorite data stripper, I used http://arohatgi.info/WebPlotDigitizer/app/.

I also had to do a bit of a shuffle with the trajectories provided to make it look right on the arm. The hip to shoulder joint maps fine, but for the knee to elbow the flexion and extension are reversed, and for the wrist to foot the $0$ angle for the foot is straight out. Once those have been accounted for, though, everything works great. Here is an animation of a single leg walking along, if you use your imagination you can just picture a happy-go-lucky torso attached to this leg whistling while he gets his very human-like gait on:

And there you go! It’s neat how straightforward that was, once you have the data of the system you want to model. It shows the power of DMPs for coordinating multiple degrees of freedom quickly and easily. Also something that’s worth pointing out once again is that this is a force based controller. We were able to go from a kinematic description of the movement we wanted to a reliable system generating torques. It’s another good example of the dexterity of dynamic movement primitives.

You can see the code for everything here up online on my github control repo on the walking_demo branch. To run the code you should run:

python run.py