Operational space control of 6DOF robot arm with spiking cameras part 3: Tracking a target using spiking cameras

Alright. Previously we got our arm all set up to perform operational space control, accepting commands through Python. In this post we’re going to set it up with a set of spiking cameras for eyes, train it to learn the mapping between camera coordinates and end-effector coordinates, and have it track an LED target.

What is a spiking camera?

Good question! Spiking cameras are awesome, and they come from Dr. Jorg Conradt’s lab. Basically what they do is return you information about movement from the environment. They’re event-driven, instead of clock-driven like most hardware, which means that they have no internal clock that’s dictating when they send information (i.e. they’re asynchronous). They send information out as soon as they receive it. Additionally, they only send out information about the part of the image that has changed. This means that they have super fast response times and their output bandwidth is really low. Dr. Terry Stewart of our lab has written a bunch of code that can be used for interfacing with spiking cameras, which can all be found up on his GitHub.

Let’s use his code to see through a spiking camera’s eye. After cloning his repo and running python setup.py you can plug in a spiking camera through USB, and with the following code have a Matplotlib figure pop-up with the camera output:

import nstbot
import nstbot.connection
import time

eye = nstbot.RetinaBot()
eye.connect(nstbot.connection.Serial('/dev/ttyUSB0', baud=4000000))



while True:

The important parts here are the creation of an instance of the RetinaBot, connecting it to the proper USB port, and calling the show_image function. Pretty easy, right? Here’s some example output, this is me waving my hand and snapping my fingers:

How cool is that? Now, you may be wondering how or why we’re going to use a spiking camera instead of a regular camera. The main reason that I’m using it here is because it makes tracking targets super easy. We just set up an LED that blinks at say 100Hz, and then we look for that frequency in the spiking camera output by recording the rate of change of each of the pixels and averaging over all pixel locations changing at the target frequency. So, to do this with the above code we simply add


And now we can track the location of an LED blinking at 100Hz! The visualization code place a blue dot at the estimated target location, and this all looks like:
Alright! Easily decoded target location complete.

Transforming between camera coordinates and end-effector coordinates

Now that we have a system that can track a target location, we need to transform that position information into end-effector coordinates for the arm to move to. There are a few ways to go about this. One is by very carefully positioning the camera and measuring the distances between the robot’s origin reference frame and working through the trig etc etc. Another, much less pain-in-the-neck way is to instead record some sample points of the robot end-effector at different positions in both end-effector and camera coordinates, and then use a function approximator to generalize over the rest of space.

We’ll do the latter, because it’s exactly the kind of thing that neurons are great for. We have some weird function, and we want to learn to approximate it. Populations of neurons are awesome function approximators. Think of all the crazy mappings your brain learns. To perform function approximation with neurons we’re going to use the Neural Engineering Framework (NEF). If you’re not familiar with the NEF, the basic idea is to using the response curves of neurons as a big set of basis function to decode some signal in some vector space. So we look at the responses of the neurons in the population as we vary our input signal, and then determine a set of decoders (using least-squares or somesuch) that specify the contribution of each neuron to the different dimensions of the function we want to approximate.

Here’s how this is going to work.

  1. We’re going to attach the LED to the head of the robot,
  2. we specify a set of (x,y,z) coordinates that we send to the robot’s controller,
  3. when the robot moves to each point, record the LED location from the camera as well as the end-effector’s (x,y,z) coordinate,
  4. create a population of neurons that we train up to learn the mapping from camera locations to end-effector (x,y,z) locations
  5. use this information to tell the robot where to move.

A detail that should be mentioned here is that a single camera only provides 2D output. To get a 3D location we’re going to use two separate cameras. One will provide (x,z) information, and the other will provide (y,z) information.

Once we’ve taped (expertly) the LED onto the robot arm, the following script to generate the information we to approximate the function transforming from camera to end-effector space:

import robot
from eye import Eye # this is just a spiking camera wrapper class

import numpy as np
import time

# connect to the spiking cameras
eye0 = Eye(port='/dev/ttyUSB2')
eye1 = Eye(port='/dev/ttyUSB1')
eyes = [eye0, eye1]
# connect to the robot
rob = robot.robotArm()

# define the range of values to test
min_x = -10.0
max_x = 10.0
x_interval = 5.0
min_y = -15.0
max_y = -5.0
y_interval = 5.0
min_z = 10.0
max_z = 20.0
z_interval = 5.0

x_space = np.arange(min_x, max_x, x_interval)
y_space = np.arange(min_y, max_y, y_interval)
z_space = np.arange(min_z, max_z, z_interval)

num_samples = 10 # how many camera samples to average over

    out_file0 = open('eye_map_0.csv', 'w')
    out_file1 = open('eye_map_1.csv', 'w')

    for i, x_val in enumerate(x_space):
        for j, y_val in enumerate(y_space):
            for k, z_val in enumerate(z_space):

                time.sleep(2) # time for the robot to move

                # take a bunch of samples and average the input to get 
                # the approximation of the LED in camera coordinates
                eye_data0 = np.zeros(2)
                for k in range(num_samples):
                    eye_data0 += eye0.position(0)[:2]
                eye_data0 /= num_samples
                out_file0.write('%0.2f, %0.2f, %0.2f, %0.2f\n' %
                                (y_val, z_val, eye_data0[0], eye_data0[1]))

                eye_data1 = np.zeros(2)
                for k in range(num_samples):
                    eye_data1 += eye1.position(0)[:2]
                eye_data1 /= num_samples
                out_file1.write('%0.2f, %0.2f, %0.2f, %0.2f\n' %
                                (x_val, z_val, eye_data1[0], eye_data1[1]))

    import sys
    import traceback
    print traceback.print_exc(file=sys.stdout)

This script connects to the cameras, defines some rectangle in end-effector space to sample, and then works through each of the points writing the data to file. The results of this code can be seen in the animation posted in part 2 of this series.

OK! So now we have all the information we need to train up our neural population. It’s worth noting that we’re only using 36 sample points to train up our neurons, I did this mostly because I didn’t want to wait around. You can of course use more, though, and the more sample points you have the more accurate your function approximation will be.

Implementing a controller using Nengo

The neural simulation software (which implements the NEF) that we’re going to be using to generate and train our neural population is called Nengo. It’s free to use for non-commercial use, and I highly recommend checking out the introduction and tutorials if you have any interest in neural modeling.

What we need to do now is generate two neural populations, one for each camera, that will receives input from the spiking camera and transform the target’s location information into end-effector coordinates. We will then combine the estimates from the two populations, and send that information out to the robot to tell it where to move. I’ll paste the code in here, and then we’ll step through it below.

from eye import Eye
import nengo
from nengo.utils.connection import target_function
import robot 

import numpy as np
import sys
import traceback

# connect to robot
rob = robot.robotArm()

model = nengo.Network()

    def eyeNet(port='/dev/ttyUSB0', filename='eye_map.csv', n_neurons=1000, 

        # connect to eye
        spiking_cam = Eye(port=port)

        # read in eval points and target output
        eval_points = []
        targets = []

        file_obj = open(filename, 'r')
        file_data = file_obj.readlines()
        for line in file_data:
            line_data = map(float, line.strip().split(','))

        eval_points = np.array(eval_points)
        targets = np.array(targets)

        # create subnetwork for eye
        net = nengo.Network(label=label)
        with net:
            def eye_input(t): 
                return spiking_cam.position(0)[:2]
            net.input = nengo.Node(output=eye_input, size_out=2)
            net.map_ens = nengo.Ensemble(n_neurons, dimensions=2)
            net.output = nengo.Node(size_in=2)

            nengo.Connection(net.input, net.map_ens, synapse=None)
            nengo.Connection(net.map_ens, net.output, synapse=None,
                                    **target_function(eval_points, targets))

        return net

    with model: 
        # create network for spiking camera 0
        eye0 = eyeNet(port='/dev/ttyUSB2', filename='eye_map_0.csv', label='eye0')
        # create network for spiking camera 1 
        eye1 = eyeNet(port='/dev/ttyUSB1', filename='eye_map_1.csv', label='eye1')

        def eyes_func(t, yzxz):
            x = yzxz[2] # x coordinate coded from eye1
            y = yzxz[0] # y coordinate coded from eye0
            z = (yzxz[1] + yzxz[3]) / 2.0 # z coordinate average from eye0 and eye1
            return [x,y,z]
        eyes = nengo.Node(output=eyes_func, size_in=4)
        nengo.Connection(eye0.output, eyes[:2])
        nengo.Connection(eye1.output, eyes[2:])

        # create output node for sending instructions to arm
        def arm_func(t, x):
            if t < .05: return # don't move arm during startup (avoid transients)
            rob.move_to_xyz(np.array(x, dtype='float32'))
        armNode = nengo.Node(output=arm_func, size_in=3, size_out=0)
        nengo.Connection(eyes, armNode)

    sim = nengo.Simulator(model)
    sim.run(10, progress_bar=False)

    print traceback.print_exc(file=sys.stdout)
    print 'disconnecting'

The first thing we’re doing is defining a function (eyeNet) to create our neural population that takes input from a spiking camera, and decodes out an end-effector location. In here, we read in from the file the information we just recorded about the camera positions that will serve as the input signal to the neurons (eval_points) and the corresponding set of function output (targets). We create a Nengo network, net, and then a couple of nodes for connecting the input (net.input) and projecting the output (net.output). The population of neurons that we’ll use to approximate our function is called net.map_ens. To specify the function we want to approximate using the eval_points and targets arrays, we create a connection from net.map_ens to net.output and use **target_function(eval_points, targets). So this is probably a little weird to parse if you haven’t used Nengo before, but hopefully it’s clear enough that you can get the gist of what’s going on.

In the main part of the code, we create another Nengo network. We call this one model because that’s convention for the top-level network in Nengo. We then create two networks using the eyeNet function to hook up to the two cameras. At this point we create a node called eyes, and the role of this node is simply to amalgamate the information from the two cameras from (x,z) and (y,z) into (x,y,z). This node is then hooked up to another node called armNode, and all armNode does is call the robot arm’s move_to_xyz function, which we defined in the last post.

Finally, we create a Simulation from model, which compiles the neural network we just specified above, and we run the simulation. The result of all of this then looks something like the following:


And there we go! Project complete! We have a controller for a 6DOF arm that uses spiking cameras to train up a neural population and track an LED, that requires almost no set up time. I gave a demo of this at the end of the summer school and there’s no real positioning of the cameras relative to the arm required, just have to tape the cameras up somewhere, run the training script, and go!

Future work

From here there are a bunch of fun ways to go about extending this. We could add another LED blinking at a different frequency that the arm needs to avoid, using an obstacle avoidance algorithm like the one in this post, add in another dimension of the task involving the gripper, implement a null-space controller to keep the arm near resting joint angles as it tracks the target, and on and on!

Another thing that I’ve looked at is including learning on the system to fine tune our function approximation online. As is, the controller is able to extrapolate and move the arm to target locations that are outside of the range of space sampled during training, but it’s not super accurate. It would be much better to be constantly refining the estimate using learning. I was able to implement a basic version that works, but getting the learning and the tracking running at the same time turns out to be a bit trickier, so I haven’t had the chance to get it all running yet. Hopefully there will be some more down-time in the near future, however, and be able to finish implementing it.

For now, though, we still have a pretty neat target tracker for our robot arm!

Tagged , , , , , ,

Operational space control of 6DOF robot arm with spiking cameras part 2: Deriving the Jacobian

In the previous exciting post in this series I outlined the project, which is in the title, and we worked through getting access to the arm through Python. The next step was deriving the Jacobian, and that’s what we’re going to be talking about in this post!

This was a time I was very glad to have a previous post talking about generating transformation matrices, because deriving the Jacobian for a 6DOF arm in 3D space comes off as a little daunting when you’re used to 3DOF in 2D space, and I needed a reminder of the derivation process. The first step here was finding out which motors were what, so I went through and found out how each motor moved with something like the following code:

for ii in range(7):
    target_angles = np.zeros(7, dtype='float32')
    target_angles[ii] = np.pi / 4.0

and I found that the robot is setup in the figures below

this is me trying my hand at making things clearer using Inkscape, hopefully it’s worked. Displayed are the first 6 joints and their angles of rotation, q_0 through q_5. The 7th joint, q_6, opens and closes the gripper, so we’re safe to ignore it in deriving our Jacobian. The arm segment lengths l_1, l_3, and l_5 are named based on the nearest joint angles (makes easier reading in the Jacobian derivation).

Find the transformation matrix from end-effector to origin

So first thing’s first, let’s find the transformation matrices. Our first joint, q_0, rotates around the z axis, so the rotational part of our transformation matrix ^0_\textrm{org}\textbf{T} is

^0_\textrm{org}\textbf{R} = \left[ \begin{array}{ccc} \textrm{cos}(q_0) & -\textrm{sin}(q_0) & 0 \\ \textrm{sin}(q_0) & \textrm{cos}(q_0) & 0 \\ 0 & 0 & 1 \end{array} \right],

and q_0 and our origin frame of reference are on top of each other so we don’t need to account for translation, so our translation component of ^0_\textrm{org}\textbf{T} is

^0_\textrm{org}\textbf{D} = \left[ \begin{array}{c} 0 \\ 0 \\ 0 \end{array} \right].

Stacking these together to form our first transformation matrix we have

^0_\textrm{org}\textbf{T} = \left[ \begin{array}{cc} ^0_\textrm{org}\textbf{R} & ^0_\textrm{org}\textbf{D} \\ 0 & 1 \end{array} \right] = \left[ \begin{array}{cccc} \textrm{cos}(q_0) & -\textrm{sin}(q_0) & 0 & 0\\ \textrm{sin}(q_0) & \textrm{cos}(q_0) & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array} \right] .

So now we are able to convert a position in 3D space from to the reference frame of joint q_0 back to our origin frame of reference. Let’s keep going.

Joint q_1 rotates around the x axis, and there is a translation along the arm segment l_1. Our transformation matrix looks like

^1_0\textbf{T} = \left[ \begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & \textrm{cos}(q_1) & -\textrm{sin}(q_1) & l_1 \textrm{cos}(q_1) \\ 0 & \textrm{sin}(q_1) & \textrm{cos}(q_1) & l_1 \textrm{sin}(q_1) \\ 0 & 0 & 0 & 1 \end{array} \right] .

Joint q_2 also rotates around the x axis, but there is no translation from q_2 to q_3. So our transformation matrix looks like

^2_1\textbf{T} = \left[ \begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & \textrm{cos}(q_2) & -\textrm{sin}(q_2) & 0 \\ 0 & \textrm{sin}(q_2) & \textrm{cos}(q_2) & 0 \\ 0 & 0 & 0 & 1 \end{array} \right] .

The next transformation matrix is a little tricky, because you might be tempted to say that it’s rotating around the z axis, but actually it’s rotating around the y axis. This is determined by where q_3 is mounted relative to q_2. If it was mounted at 90 degrees from q_2 then it would be rotating around the z axis, but it’s not. For translation, there’s a translation along the y axis up to the next joint, so all in all the transformation matrix looks like:

^3_2\textbf{T} = \left[ \begin{array}{cccc} \textrm{cos}(q_3) & 0 & -\textrm{sin}(q_3) & 0\\ 0 & 1 & 0 & l_3 \\ \textrm{sin}(q_3) & 0 & \textrm{cos}(q_3) & 0\\ 0 & 0 & 0 & 1 \end{array} \right] .

And then the transformation matrices for coming from q_4 to q_3 and q_5 to q_4 are the same as the previous set, so we have

^4_3\textbf{T} = \left[ \begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & \textrm{cos}(q_4) & -\textrm{sin}(q_4) & 0 \\ 0 & \textrm{sin}(q_4) & \textrm{cos}(q_4) & 0 \\ 0 & 0 & 0 & 1 \end{array} \right] .


^5_4\textbf{T} = \left[ \begin{array}{cccc} \textrm{cos}(q_5) & 0 & -\textrm{sin}(q_5) & 0 \\ 0 & 1 & 0 & l_5 \\ \textrm{sin}(q_5) & 0 & \textrm{cos}(q_5) & 0\\ 0 & 0 & 0 & 1 \end{array} \right] .

Alright! Now that we have all of the transformation matrices, we can put them together to get the transformation from end-effector coordinates to our reference frame coordinates!

^\textrm{ee}_\textrm{org}\textbf{T} = ^0_\textrm{org}\textbf{T} \; ^1_0\textbf{T} \; ^2_1\textbf{T} \; ^3_2\textbf{T} \; ^4_3\textbf{T} \; ^5_4\textbf{T}.

At this point I went and tested this with some sample points to make sure that everything seemed to be being transformed properly, but we won’t go through that here.

Calculate the derivative of the transform with respect to each joint

The next step in calculating the Jacobian is getting the derivative of ^\textrm{ee}_\textrm{org}T. This could be a big ol’ headache to do it by hand, OR we could use SymPy, the symbolic computation package for Python. Which is exactly what we’ll do. So after a quick

sudo pip install sympy

I wrote up the following script to perform the derivation for us

import sympy as sp

def calc_transform():
    # set up our joint angle symbols (6th angle doesn't affect any kinematics)
    q = [sp.Symbol('q0'), sp.Symbol('q1'), sp.Symbol('q2'), sp.Symbol('q3'),
            sp.Symbol('q4'), sp.Symbol('q5')]
    # set up our arm segment length symbols
    l1 = sp.Symbol('l1')
    l3 = sp.Symbol('l3')
    l5 = sp.Symbol('l5')

    Torg0 = sp.Matrix([[sp.cos(q[0]), -sp.sin(q[0]), 0, 0,],
                       [sp.sin(q[0]), sp.cos(q[0]), 0, 0],
                       [0, 0, 1, 0],
                       [0, 0, 0, 1]])

    T01 = sp.Matrix([[1, 0, 0, 0],
                     [0, sp.cos(q[1]), -sp.sin(q[1]), l1*sp.cos(q[1])],
                     [0, sp.sin(q[1]), sp.cos(q[1]), l1*sp.sin(q[1])],
                     [0, 0, 0, 1]])

    T12 = sp.Matrix([[1, 0, 0, 0],
                     [0, sp.cos(q[2]), -sp.sin(q[2]), 0],
                     [0, sp.sin(q[2]), sp.cos(q[2]), 0],
                     [0, 0, 0, 1]])

    T23 = sp.Matrix([[sp.cos(q[3]), 0, sp.sin(q[3]), 0],
                     [0, 1, 0, l3],
                     [-sp.sin(q[3]), 0, sp.cos(q[3]), 0],
                     [0, 0, 0, 1]])

    T34 = sp.Matrix([[1, 0, 0, 0],
                     [0, sp.cos(q[4]), -sp.sin(q[4]), 0],
                     [0, sp.sin(q[4]), sp.cos(q[4]), 0],
                     [0, 0, 0, 1]])

    T45 = sp.Matrix([[sp.cos(q[5]), 0, sp.sin(q[5]), 0],
                     [0, 1, 0, l5],
                     [-sp.sin(q[5]), 0, sp.cos(q[5]), 0],
                     [0, 0, 0, 1]])

    T = Torg0 * T01 * T12 * T23 * T34 * T45

    # position of the end-effector relative to joint axes 6 (right at the origin)
    x = sp.Matrix([0,0,0,1])

    Tx = T * x

    for ii in range(6):
        print q[ii]
        print sp.simplify(Tx[0].diff(q[ii]))
        print sp.simplify(Tx[1].diff(q[ii]))
        print sp.simplify(Tx[2].diff(q[ii]))

And then consolidated the output using some variable shorthand to write a function that accepts in joint angles and generates the Jacobian:

def calc_jacobian(q):
    J = np.zeros((3, 7))

    c0 = np.cos(q[0])
    s0 = np.sin(q[0])
    c1 = np.cos(q[1])
    s1 = np.sin(q[1])
    c3 = np.cos(q[3])
    s3 = np.sin(q[3])
    c4 = np.cos(q[4])
    s4 = np.sin(q[4])

    c12 = np.cos(q[1] + q[2])
    s12 = np.sin(q[1] + q[2])

    l1 = self.l1
    l3 = self.l3
    l5 = self.l5

    J[0,0] = -l1*c0*c1 - l3*c0*c12 - l5*((s0*s3 - s12*c0*c3)*s4 + c0*c4*c12)
    J[1,0] = -l1*s0*c1 - l3*s0*c12 + l5*((s0*s12*c3 + s3*c0)*s4 - s0*c4*c12)
    J[2,0] = 0

    J[0,1] = (l1*s1 + l3*s12 + l5*(s4*c3*c12 + s12*c4))*s0
    J[1,1] = -(l1*s1 + l3*s12 + l5*s4*c3*c12 + l5*s12*c4)*c0
    J[2,1] = l1*c1 + l3*c12 - l5*(s4*s12*c3 - c4*c12)

    J[0,2] = (l3*s12 + l5*(s4*c3*c12 + s12*c4))*s0
    J[1,2] = -(l3*s12 + l5*s4*c3*c12 + l5*s12*c4)*c0
    J[2,2] = l3*c12 - l5*(s4*s12*c3 - c4*c12)

    J[0,3] = -l5*(s0*s3*s12 - c0*c3)*s4
    J[1,3] = l5*(s0*c3 + s3*s12*c0)*s4
    J[2,3] = -l5*s3*s4*c12

    J[0,4] = l5*((s0*s12*c3 + s3*c0)*c4 + s0*s4*c12)
    J[1,4] = l5*((s0*s3 - s12*c0*c3)*c4 - s4*c0*c12)
    J[2,4] = -l5*(s4*s12 - c3*c4*c12)

    return J

Alright! Now we have our Jacobian! Really the only time consuming part here was calculating our end-effector to origin transformation matrix, generating the Jacobian was super easy using SymPy once we had that.

Hack position control using the Jacobian

Great! So now that we have our Jacobian we’ll be able to translate forces that we want to apply to the end-effector into joint torques that we want to apply to the arm motors. Since we can’t control applied force to the motors though, and have to pass in desired angle positions, we’re going to do a hack approximation. Let’s first transform our forces from end-effector space into a set of joint angle torques:

\textbf{u} = \textbf{J}^T \; \textbf{u}_\textbf{x}.

To approximate the control then we’re simply going to take the current set of joint angles (which we know because it’s whatever angles we last told the system to move to) and add a scaled down version of \textbf{u} to approximate applying torque that affects acceleration and then velocity.

\textbf{q}_\textrm{des} = \textbf{q} + \alpha \; \textbf{u},

where \alpha is the gain term, I used .001 here because it was nice and slow, so no crazy commands that could break the servos would be sent out before I could react and hit the cancel button.

What we want to do then to implement operational space control here then is find the current (x,y,z) position of the end-effector, calculate the difference between it and the target end-effector position, use that to generate the end-effector control signal u_x, get the Jacobian for the current state of the arm using the function above, find the set of joint torques to apply, approximate this control by generating a set of target joint angles to move to, and then repeat this whole loop until we’re within some threshold of the target position. Whew.

So, a lot of steps, but pretty straight forward to implement. The method I wrote to do it looks something like:

def move_to_xyz(self, xyz_d):
    np.array xyz_d: 3D target (x_d, y_d, z_d)
    count = 0
    while (1):
        count += 1
        # get control signal in 3D space
        xyz = self.calc_xyz()
        delta_xyz = xyz_d - xyz
        ux = self.kp * delta_xyz

        # transform to joint space
        J = self.calc_jacobian()
        u = np.dot(J.T, ux)

        # target joint angles are current + uq (scaled)
        self.q[...] += u * .001
        self.robot.move(np.asarray(self.q.copy(), 'float32'))

        if np.sqrt(np.sum(delta_xyz**2)) < .1 or count > 1e4:

And that is it! We have successfully hacked together a system that can perform operational space control of a 6DOF robot arm. Here is a very choppy video of it moving around to some target points in a grid on a cube.
So, granted I had to drop a lot of frames from the video to bring it’s size down to something close to reasonable, but still you can see that it moves to target locations super fast!

Alright this is sweet, but we’re not done yet. We don’t want to have to tell the arm where to move ourselves. Instead we’d like the robot to perform target tracking for some target LED we’re moving around, because that’s way more fun and interactive. To do this, we’re going to use spiking cameras! So stay tuned, we’ll talk about what the hell spiking cameras are and how to use them for a super quick-to-setup and foolproof target tracking system in the next exciting post!

Tagged , , , , , , ,

Operational space control of 6DOF robot arm with spiking cameras part 1: Getting access to the arm through Python

From June 9th to the 19th we ran a summer school (brain camp) in our lab with people from all over to come and learn how to use our neural modeling software, Nengo, and then work on a project with others in the school and people from the lab. We had a bunch of fun hardware available, from neuromorphic hardware from the SpiNNaker group over at Cambridge to lego robots on omni-wheels to spiking cameras (i’ll discuss what a spiking camera is exactly in part 3 of this series) and little robot arms. There were a bunch of awesome projects, people built neural models capable of performing a simplified version of the Wisconsin card sorting task that they then got running on the SpiNNaker boards, a neural controller built to move a robot leech, a spiking neurons reinforcement-learning system implemented on SpiNNaker with spiking cameras to control the lego robot that learned to move towards green LEDs and avoid red LEDs, and a bunch of others. If you’re interested in participating in these kinds of projects and learning more about this I highly suggest you apply to the summer school for next year!

I took the summer school as a chance to break from other projects and hack together a project. The idea was to take the robot arm, implement an operational space controller (i.e. find the Jacobian), and then used spiking cameras to track an LED and have the robot arm learn how to move to the target, no matter where the cameras were placed, by learning the relationship between where the target is in camera-centric coordinates and arm-centric coordinates. This broke down into several steps: 1) Get access to the arm through Python, 2) derive the Jacobian to implement operational space control, 3) sample state space to get an approximation of the camera-centric to arm-centric function, 4) implement the control system to track the target LED.

Here’s a picture of the set-up during step 3, training.
So in the center is the 6DOF robot arm with a little LED attached to the head, and highlighted in orange circles are the two spiking cameras, expertly taped to the wall with office-grade scotch tape. You can also see the SpiNNaker board in the top left as a bonus, but I didn’t have enough time to involve it in this project.

I was originally going to write this all up as one post, because the first two parts are re-implementing previous posts, but even skimming through those steps it was getting long and I’m sure no one minds having a few different examples to look through for generating Cython code or deriving a Jacobian. So I’m going to break this into a few parts. Here in this post we’ll work through the first step (Cython) of our journey.

Get access to the arm through Python

The arm we had was the VE026A Denso training robot, on loan from Dr. Bryan Tripp of the neuromorphic robotics lab at UW. Previously an interface had been built up by one of Dr. Tripp’s summer students, written in C. C is great and all but Python is much easier to work with, and the rest of the software I wanted to work you know what I’m done justifying it Python is just great so Python is what I wanted to use. The end.

So to get access to the arm in Python I used the great ol’ C++ wrapper hack described in a previous post. Looking at Murphy-the-summer-student’s C code there were basically three functions I needed access to:

// initialize threads, connect to robot
void start_robot(int *semid, int32_t *ctrlid, int32_t *robotid, float **angles)
// send a set of joint angles to the robot servos
void Robot_Execute_slvMove(int32_t robotid, float j1, float j2, float j3, float j4, float j5, float j6, float j7, float j8)
// kill threads, disconnect from robot
void stop_robot(int semid, int32_t ctrlid, int32_t robotid)

So I changed the extension of the file to ‘.cpp’ (I know, I know, I said this was a hack!), fixed some compiler errors that popped up, and then appended the following to the end of the file:

/* A class to contain all the information that needs to
be passed around between these functions, and can
encapsulate it and hide it from the Python interface.

Written by Travis DeWolf (June, 2015)

class Robot {
/* Very simple class, just stores the variables
* needed for talking to the robot, and a gives access
* to the functions for moving it around */

int semid;
int32_t ctrlid;
int32_t robotid;
float* angles;

void connect();
void move(float* angles_d);
void disconnect();

Robot::Robot() { }

Robot::~Robot() { free(angles); }

/* Connect to the robot, get the ids and current joint angles */
/* char* usb_port: the name of the port the robot is connected to */
void Robot::connect()
start_robot(&amp;semid, &amp;ctrlid, &amp;robotid, &amp;angles);
printf("%i %i %i", robotid, ctrlid, semid);

/* Move the robot to the specified angles */
/* float* angles: the target joint angles */
void Robot::move(float* angles)
// convert from radians to degrees
angles[0] * 180.0 / 3.14,
angles[1] * 180.0 / 3.14,
angles[2] * 180.0 / 3.14,
angles[3] * 180.0 / 3.14,
angles[4] * 180.0 / 3.14,
angles[5] * 180.0 / 3.14,
angles[6] * 180.0 / 3.14,
angles[7] * 180.0 / 3.14);

/* Disconnect from the robot */
void Robot::disconnect()
stop_robot(semid, ctrlid, robotid);

int main()
Robot robot = Robot();
// connect to robot

// move robot to some random target angles
float target_angles[7] = {0, np.pi / 2.0, 0.0, 0, 0, 0, 0};


// disconnect

return 0;

So very simple class. Basically just wanted to create a set of functions to hide some of the unnecessary parameters from Python, do the conversion from radians to degrees (who works in degrees? honestly), and have a short little main function to test the creation of the class, and connection, movement, and disconnection of the robot. Like I said, there were a few compiler errors when switching from C to C++, but really that was the only thing that took any time on this part. A few casts and everything was gravy.

The next part was writing the Cython pyRobot.pyx file (I describe the steps involved in this in more detail in this post):

import numpy as np
cimport numpy as np

cdef extern from "bcap.cpp":
cdef cppclass Robot:
void connect()
void move(float* angles)
void disconnect()

cdef class pyRobot:
cdef Robot* thisptr

def __cinit__(self):
self.thisptr = new Robot()

def __dealloc__(self):
del self.thisptr

def connect(self):
Set up the connection to the robot, pass in a vector,
get back the current joint angles of the arm.
param np.ndarray angles: a vector to store current joint angles

def move(self, np.ndarray[float, mode="c"] angles):
Step the simulation forward one timestep. Pass in target angles,
get back the current arm joint angles.
param np.ndarray angles: 7D target angle vector

def disconnect(self):
Disconnect from the robot.

the setup.py file:

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

name = 'Demos',
cmdclass = {'build_ext': build_ext},


and then compiling!

run setup.py build_ext -i

With all of this working, a nice test.so file was created, and it was now possible to connect to the robot in Python with

import test
rob = test.pyRobot()
target_angles = np.array([0, np.pi/2, 0, np.pi/4, 0, 0, 0, 0], dtype='float32')
import time

In the above code we’re instantiating the pyRobot class, connecting to the robot, defining a set of target angles and telling the robot to move there, waiting for 1 second to give the robot time to actually move, and then disconnecting from the robot. Upon connection we have to pass in a set of joint angles for the servos, and so you see the robot arm jerk into position, and then move to the target set of joint angles, it looks something exactly like this:


Neat, phase 1 complete.

At the end of phase 1 we are able to connect to the robot arm through Python and send commands in terms of joint angles. But we don’t want to send commands in terms of joint angles, we want to just specify the end-effector position and have the robot work out the angles! I’ve implemented an inverse kinematics solver using constrained optimization before for a 3-link planar arm, but we’re not going to go that route. Find out what we’ll do by joining us next time! or by remembering what I said we’d do at the beginning of this post.

Tagged , , ,

Reinforcement Learning part 4: Combining egocentric and allocentric

Alright, in my last post on RL we looked at egocentric learning and what that is. We also saw that both ego and allocentric learning have their strong points, and they compliment each other. So, some combination of these two approaches seems like a good idea. In this post we’re going to look at combining the output from these two kinds of learners and see the strengths of different weightings. Per usual, all the code from this post is up on my Github.

Combining allocentric and egocentric approaches

There are a few different ways that come to mind that we could combine these two learning styles. In each of these cases the overall setup is that we’re going to run two learning systems, one that is allocentric and one that is egocentric, combine their action value mappings for the current state in some way, and then decide which action to take based on their combined output. So here are several possible ways to combine the outputs:

  • Only use allocentric
  • Only use egocentric
  • Average the mappings
  • Dynamically change weights

There are, of course, many more ways to weight these systems, but these are some basic ones and they’re what we’re going to look at here.

For testing each of these these different setups we are going to run the mouse on a map for a set amount of time, measure the wins, accidental deaths, and intentional deaths, and then gather statistics (mean value and confidence intervals) across 10 trials.

A side note: The rate of exploration (epsilon value) starts out at .1 and is decreased by half every 100,000 simulation time step, so there are a number of jumps where you see the number of wins go up every 100,000, and that’s an artifact of the rate of exploration dropping (i.e. increasing your exploitation of what you already know about the world).

OK, we’re going to do this across 3 different maps, where the size and complexity increases with each map.

Map 1: the basic cliff map (which should look familiar):

Map 2: a middlesized map with several areas that are identical for the egocentric state:

and Map 3: a pretty large map with even more ambiguous areas:



Now, what we expect from our learning systems is that the egocentric learner with have very few ‘suicide’ deaths, where it’s slow to learn that jumping off a cliff is a terrible idea. But on the flip side of that, since we’re going to set the gamma value of the egocentric learner to 0, (which means that it doesn’t incorporate anything other than the immediate reward into it’s learning, as discussed in the previous post), we expect that it will only rarely find the goal. Also, as the map gets larger, we expect that the goal will be found less and less, as it will have to randomly stumble around avoiding cliffs until it ends up in sight of the cheese.

From the allocentric learner we expect the death rate to be much higher than the egocentric learner, but we also expect the success rate to be significantly higher, as it will be able to identify how to move based on its current (x,y) position, rather than based on its immediate surroundings. As the map gets larger though, we expect the number of deaths to be larger, since there is no transfer of knowledge that jumping off a cliff in this position is as bad as jumping off a cliff in that position.

Alright! Let’s get to it!


Only use allocentric

Not much to explain about this, the mouse is going to be learning Q-values where each state is its (x,y) position on the map. The bigger the map, the more deaths we expect this learner to have. So, here are the results:

Alright, so one of the most salient things in these plots is the stepping up of the number of wins at each of the 100,000 marks. Again, this is due to a transfer from exploration to exploitation of what’s already been learned as the epsilon value is decreased. And, also, as the map gets bigger, the number of wins takes longer to max out, but it’s pretty interesting how consistent the learning is (as seen by the very small confidence intervals on the runs) across trials given the random exploration.
Another point of note is that the number of deaths goes way up with the size of the map, as expected.

Only use egocentric

Again, this one is another straightforward case, the mouse is going to be learning Q-values where each state is its set of immediate neighbors (e.g. cliff to the right, cheese in front, empty cells left and behind). Here we expect that it will have a minimal number of deaths (not much more than a death for each of the situations it can encounter a cliff or cliffs in), and that the number of successes should be pretty low and very inconsistent, since it will be entirely based on randomly stumbling into a spot near the cheese while avoiding cliffs. Additionally, we expect the number of results to drop with increases in the size of the map. The results are below:

The results are as expected, lots of variation in the wins number, low suicide numbers, and the number of wins decreases with map sizes. Something that’s kind of interesting here is that you don’t see any real effects of the epsilon value as it decreases as you do in the egocentric case. I believe that this is because it achieves a balance of exploration that throws you off a cliff and exploration that moves you into the vicinity of the cheese. So it’s dying less when the exploration rate drops, but it also has less chance of reaching the cheese now. That’s my guess, at least. Alright! Next.

Average the allo and ego Q-values

Here we’re going to do the most basic combination of the allo and ego-centric learning possible. Both learners will run simultaneously, and generate a set of Q-values over the set of possible actions at each state, and we’ll average the output of both then choose the max Q-value to choose which action is taken. The hope here is that the ego-centric learner will very quickly learn not to jump off cliffs, and the allocentric learner will more slowly learn how to move in the map to get cheese. So we expect few deaths and more wins. Let’s look at the results:

Damn! So now we have the same (or at least very similar) win rate as the allocentric learner (awesome), and the same death rate as the egocentric learner (also awesome)! We have achieved some sort of super-human…or, super-mouse mouse. A super mouse. This is really cool, it tells us that these two systems really do compliment each other, and that it can be incredibly straightforward to implement a combination of the two.

Dynamic weighting

Here we’ll look at the results from a variation of an algorithm I’ve implemented based on this really interesting paper by Dr. Sakya Dasgupta et al where they got a lot of really neat results implemented on a little robot critter. We can’t quite use the algorithm as presented here for a couple of reasons.

In the paper, the authors design the algorithm for weighting the output from classical conditioning and operant conditioning systems. I’m taking these to be pretty close to analogous to an ego-centric learner (with no lookahead) and an allo-centric learner, respectively. The idea being that classical conditioning is based on immediate associations developed from rewards, and operant conditioning allows the longer-term associations to be formed. Specifically, in operant you learn to associate the conditioned stimulus with the unconditioned stimulus, which leads to a a conditioned response. The classic example being the dog salivating when hearing the bell, because it knows food is coming. Here the allocentric learner is doing a similar thing, using a look-ahead to start associating stimuli (in this case different states) with a reward or punishment. So, this could be a wildly inappropriate application of this algorithm, and the comparison definitely warrants further discussion.

The more immediate reason for varying the algorithm is because the output of their system is a continuous ‘left/right’ decision, and the output from the mouse here is a discrete ‘left/right/up/down’ decision. So the systems in the paper output a number from -1 to 1 that indicates which way to turn, and this is used in the update. So the algorithm needs to be changed up a little.

First, here’s the original algorithm:

weight_allo += eta * reward * [output_allo - filtered_output_allo] * output_ego
weight_ego += eta * reward * [output_ego - filtered_output_ego] * output_allo

where eta is the learning rate, reward is the immediate reward of the state you move into, output_ is the -1 to 1 value from the learners, and filtered_output_ is a low pass filtered version of the output. So the first part is standard, learning rate multiplied by the reward, and then the part inside the brackets is calculating a derivative of the output of the learners. This is again standard, did your learner change? If it did change and a reward was received, then increase the weight of that learner. The last term, which multiplies it all by the output from the other learner is for scaling.

And then normalize the weights relative to each other, and it isn’t explicitly stated in the paper, but also lower bound the weights at 0. The reason you need to do this lower bounding is that if you don’t, then when you use the weight as a gain on the learner output all of the Q-values are inverted! Not easy to recover from that.

The dynamic weighting equations that I’m using are:

weight_allo += eta * reward * dq_allo * output_ego
weight_ego += eta * reward * dq_ego * output_allo

and then lower bound at 0 and normalize.

There is still a lot in these equations that I’m unsure about (for example scaling by the output of the other learner instead of by the other weight, and how the signs of the reward, derivative, and other learner output all interact), but I’m going to leave delving into that for a future post, since this one is getting long already and I don’t have a full understanding of it yet. But! I have enough of an understanding to implement the above and run simulations! Here are the results:

I should note that the learning rate here for the weight equations is 5e-4. When you decrease this weight you get improved performance, but the best I’ve gotten is just reducing it to a point that it’s pretty much constantly using .5 for each. And that’s boring.

Alright, comparing to basic averaging here we notice that we have about the same performance for the small map, and then on the bigger maps significantly worse performance and also a lot more variability between runs. Basically what seems to happen is that the system quickly identifies that the allocentric learner should not have control at the beginning, because it keeps killing the mouse, and then it takes a lot longer to start handing control back over from the egocentric learner to the allocentric learner.


Alright, so that’s all! Bottom line is that allocentric and egocentric learning really work well together, and basic averaging of these gives great results. Whether this is the best means of combining them is unclear, however. This was just a really basic test of different combinations, and there is some really neat work looking at different ways to do it. It’s super fun to mess around with and there’s a ton of room for experimentation. In the code up on my Github, the code for weight learning can be found by searching for weight_learning. If anyone plays around and figures out a kick-ass dynamic weighting algorithm please pass it along!

Tagged , , ,

Reinforcement Learning part 3: Egocentric learning

When an agent is learning how to move, it’s learning a set of values to associate with possible actions for each situation it encounters. We say that it’s learning the action values. When the agent learns to associate the values of taking different actions in different states, it is learning the state-action values, also referred to as the Q-values. How the agent identifies ‘a situation’ can be handled in a couple of different ways.

If the agent is performing egocentric learning, then it distinguishes situations based on it’s immediately surrounding environment. If the agent is perform allocentric learning, then it has an internal map of the environment, and each different position on the map is considered a different situation. Whether an agent is using allocentric or egocentric learning ends up influencing its behavior quite a bit, and both have their pros and cons. We’ve some of both ego and allo-centric in the previous RL posts; in this post we’re going to examine egocentric learning specifically a little closer. And the code for this post can be found up on my GitHub.

Difference between ego and allo-centric

Let’s look at when a new set of action values is learned in these two cases. Going back to the ever wonderful mouse and cheese scenario, we’ll create a map for our mouse to learn to navigate.

In egocentric learning, each situation is determined by what is near the mouse, and so new action values for each of the possible movements ({W,NW,N,NE,E,SE,S,SW}) have to be learned whenever the immediate environment is not something the agent has seen before. For example, the mouse keeps a different set of action values for each of the following environments:

mouse states
On the other hand, in the case of allocentric learning, each situation is determined by the (x,y) coordinates of the mouse’s current position. That means that the mouse keeps a different set of action values for every possible (x,y) position:

mouse allo
so even if the environment in each position is the same:

mouse states2
with different (x,y) coordinates the mouse maintains different action value maps for all of them.

With egocentric learning, the agent tends to develop immediately beneficial behavior more quickly, to avoid running off cliffs and to move towards cheese when it’s in sight. The agent can even learn to move towards cheese that’s outside of the range of its sensors in simple maps, essentially by learning heuristics like ‘always moving right leads to reward.’ But this falls apart if there are several areas in the map where the immediate environment is the same, as in the case above with states 2 and 3.

With allocentric learning, the agent develops action value mappings that are much better suited to the overall environment and long-term goals (e.g. if the cheese is always in the center of the map, if you are in the bottom-right move up and left, and if you are in the top-left move down and right), but you can’t generalize across local situations (for example when there is a cliff one space to the right), the consequences of movement at each (x,y) position must be learned independently.

Clearly a combination of these two learning types is desirable, but that is for a future post! There is still a lot to examine in ego-centric learning, and so we’re going to dig deeper.

Thinking about the future

In allocentric learning, you don’t often perform an action, successfully move, and then still wind up in the same state. This makes transmitting information about rewards received in the future backwards to states that are still far away fairly stable, because a specific state is some constant distance away from getting reward. In egocentric learning, however, this isn’t the case. So when you start trying to propagate reward information backwards you can get some unstable behavior; let’s look at a very basic scenario:

There’s the mouse in blue, on the left, and the reward, in green, on the right, and red represents cliffs / certain death. So now, whenever the mouse isn’t beside one of the walls he will be in one of three states, cliffs above, nothing around him, or cliffs below:


OK. Now let’s look at the update formula for Q-learning again:

Q(s, a) += alpha * (reward(s') + gamma * max(Q(s')) - Q(s, a))

where s is the previous system state, a is the action just taken, s' is the current state, reward(s') is the reward for being in the current state, and max(Q(s')) is the maximum reward obtainable by performing an action in the current state.

Informally, it says that the change to make to the Q-value for the action just taken, Q(s, a), is equal to your learning rate, alpha, multiplied by the reward for the state that you moved into,reward(s'), plus the lookahead reward,max(Q(s')), which relays the expected reward now that you’re in state s', minus the previous Q-value,Q(s, a).

Let’s break this down a bit more, because this can be complicated to think about still. If we set gamma = 0, then our update is

Q(s, a) += alpha * (reward(s') - Q(s, a)).

Here it’s more apparent what the dynamics are doing, they’re driving the Q-value to the reward value of the state you end up in. So you take an action a in state s, you find the reward for where you ended up, and you learn the reward for performing action a in state s. Now if gamma is not equal to 0, then you can think of the Q-value dynamics as trying to drive the Q-values to

reward(s') + gamma * max(Q(s'))

The larger that gamma is, the farther into the future you look. This can lead to unstable / non-convergent systems when you repeatedly move into the same state again and again. For example, if you’re in State 2 from above, and you move right often enough you’ll end up in a state that leads to getting a reward. This boosts the state before the reward, and now the next time you happen to move to a state where the reward is next to you, the Q-values of State 2 are now updated. And so you get your reward and you reset. Now you move into State 2 again and hey! Moving right leads to a reward. So you move right, but now you’re back in State 2 again! And so the reward for moving right decreases a bit. But it’s still the best decision, so you move right. And you’re in State 2 again, and so the reward for moving right goes down again, etc etc. Eventually moving right brings you to the target though and the Q-value for moving right gets a boost. So it’s going to forever be fluctuating.

In this example, it doesn’t seem like a big deal, because the mouse still learns to move towards the cheese and the Q-values have entered a kind of rhythmic limit cycle. There are a few problems that arise, however, when counting the future to heavily in egocentric learning.

Problem: Death-wish

Basically, the mouse decides that certain death is a better option than wandering around in uncertainty forever and starts deciding to commit suicide. Let’s say that to encourage exploration, the reward for being in an unoccupied square is -1, and to encourage not dying the reward for jumping off a cliff is -10. The problem arises in that jumping off the cliff is a constant punishment, -10. It’s always -10 and it will never be any worse than -10. There is no further future past death to influence the reward. When the mouse is just exploring, however, it will explore, and get -1, and then explore more, and get -1, etc etc. This causes the expected reward of actions that lead you into the same state keep dropping. If the weight of gamma is too large, then without all that much exploring the reward for exploring has suddenly dropped to less than -10, and the mouse decides that continuing to explore is worse than death, and jumps.

By setting gamma = 5 (which is pretty high) we then get this behavior very quickly, here are some simulation results, where the first column is the timestep, the second column W denotes how many times the mouse found the cheese, the third column L denotes how many times the mouse chose to jump off a cliff (as dictated by its Q-values), and the last column A denotes how many times random exploration led the mouse to its accidental death:

10000, W: 0, L: 3138, A: 51
20000, W: 0, L: 3207, A: 40
30000, W: 0, L: 3204, A: 40
40000, W: 0, L: 3205, A: 45
50000, W: 0, L: 3208, A: 43
60000, W: 0, L: 3198, A: 49
70000, W: 0, L: 3200, A: 49
80000, W: 0, L: 3207, A: 46
90000, W: 0, L: 3204, A: 46
100000, W: 0, L: 3214, A: 38
110000, W: 0, L: 3211, A: 37
120000, W: 0, L: 3203, A: 44
130000, W: 0, L: 3199, A: 45
140000, W: 0, L: 3225, A: 28
150000, W: 0, L: 3211, A: 39
160000, W: 0, L: 3196, A: 45
170000, W: 0, L: 3208, A: 46
180000, W: 0, L: 3203, A: 47
190000, W: 0, L: 3209, A: 38
200000, W: 0, L: 3216, A: 38

So pretty immediately the mouse gets into a dark place, and starts choosing to jump, without ever having found the cheese even once.

Problem: Confusion

Another problem is that by heavily weighting the immediate future the mouse can get easily confused again about what was good or how it actually found the cheese and got the big reward in the first place. Once State 2 has rewarded moving right, and the mouse moves left, there is now a new max(Q(s')) value for State 2 (the reward for moving right), and so the reward for moving left is now brought up. At the same time, the reward for moving right is dropping every time it moves right and doesn’t get rewarded (as described above). Same with moving up and down, so very quickly the Q-values can be flooded with high values and the optimal direction of movement is lost.

We can see this case by setting gamma = .8. Now, the mouse will find the cheese and learns to move right, but there are also these blips of confusion where it forgets how to find the cheese,

10000, W: 0, L: 4, A: 78
20000, W: 0, L: 0, A: 78
30000, W: 87, L: 0, A: 67
40000, W: 810, L: 0, A: 16
50000, W: 841, L: 0, A: 1
60000, W: 843, L: 0, A: 2
70000, W: 847, L: 0, A: 3
80000, W: 816, L: 0, A: 10
90000, W: 667, L: 0, A: 17
100000, W: 0, L: 0, A: 68
110000, W: 0, L: 0, A: 68
120000, W: 34, L: 0, A: 68
130000, W: 748, L: 0, A: 43
140000, W: 853, L: 0, A: 0
150000, W: 846, L: 0, A: 2
160000, W: 847, L: 0, A: 3
170000, W: 856, L: 0, A: 2
180000, W: 853, L: 0, A: 2
190000, W: 120, L: 0, A: 62
200000, W: 0, L: 0, A: 70

So it’s getting there, but then it gets confused and forgets how to find it, then it remembers, then it forgets…etc.

Solution: Have a healthy fear of death

If instead now we set gamma = .05, the mouse has a much stronger repulsion to imminent death than to ennui for a much longer time, and will go around exploring things (note that this is the same as changing the reward for dying to be a much higher penalty). As a result the mouse will find the cheese faster. As a bonus, we also don’t have the fits of forgetfulness seem in the previous example:

10000, W: 313, L: 4, A: 50
20000, W: 845, L: 0, A: 6
30000, W: 848, L: 0, A: 1
40000, W: 850, L: 0, A: 1
50000, W: 850, L: 0, A: 1
60000, W: 846, L: 0, A: 2
70000, W: 844, L: 0, A: 6
80000, W: 843, L: 0, A: 2
90000, W: 846, L: 0, A: 2
100000, W: 847, L: 0, A: 5
110000, W: 844, L: 0, A: 4
120000, W: 849, L: 0, A: 3
130000, W: 846, L: 0, A: 5
140000, W: 842, L: 0, A: 4
150000, W: 845, L: 0, A: 4
160000, W: 847, L: 0, A: 5
170000, W: 848, L: 0, A: 10
180000, W: 844, L: 0, A: 7
190000, W: 840, L: 0, A: 8
200000, W: 848, L: 0, A: 1

And so in this case now we’ve developed a pretty good, stable, egocentric RL mouse, that can learn to avoid jumping off cliffs and find the cheese in this very simple map.

We can still, of course, drop the gamma value down to the extreme, 0, such that our egocentric learner would have no knowledge of any future further than the reward it gets for taking the next step. In this case it would know that going off a cliff is bad, but it won’t know that moving to the right will eventually lead to cheese / a reward.

However, that can be OK! Warding off any immediate danger is and capitalizing on any immediate reward is the perfect role for this kind of short-sighted system. Trying to make it learn how to run through a complicated maze without a lot of very specific landmarks is just not something that’s going to happen.


So, recapping, the egocentric system learns Q-values for states that are based on the environment relative to the agent. This has some important consequences, in that it allows it to learn very quickly not to move forward if there’s a cliff in front of you, and that if there’s a reward in the immediate vicinity to go get it. Setting the weight on the expected reward in the Q-values update can lead to some unstable behavior, and it’s important to have a good concept of how the behavior of the agent changes with different gamma values. These undesirable behaviors can be avoided by setting gamma = 0, but the agent will then only learn the consequences of its immediate action, and be unable to look into the future at all to see how to bring about a reward or avoid getting trapped.

The egocentric learner has strengths where the allocentric learner fails, and the allocentric learner has strengths where the egocentric learner fails. The natural thought is to combine these two learners, and that will be the subject of the next RL post!

Tagged , , ,

Arm visualization with PyGame

So, in my neverending quest for better arm visualizations so I can make prettier pictures / videos and improve my submission acceptance rates I’ve started looking at PyGame. I started out hoping to find something that was quick and easy in Python for animating using SVG images, and PyGame is as close as I got. All in all, I’m decently happy with the result. It runs slower than the Matplotlib animation, and you can’t zoom in like you can on a Matplotlib graph, but it’s prettier. So, tradeoffs!

There were a few things that I ran into that tripped me up a bit when I was doing this implementation, so I thought that I would share, and that’s what this post is going to be about. Below is an animation showing what the arm visualization looks like, and as always the code for everything can be found on my Github. That links you to the control folder, and you can find all of the code developed / used for this post in this folder up on my GitHub.

Arm visualization using PyGame

Alright, there were a few parts in doing this that were a bit tricky. Let’s go through them one at a time.

Rotations in PyGame

Turns out rotating images is a pain right off the bat, but once you get over the initial hurdles it looks pretty slick.

Centering your image

To start we’re just going to attempt to rotate an image in place, here’s a first pass:

import pygame
import pygame.locals

white = (255, 255, 255)


display = pygame.display.set_mode((300, 300))
fpsClock = pygame.time.Clock()

image = pygame.image.load('pic.png')

while 1:


    image = pygame.transform.rotate(image, 1)
    rect = image.get_rect()

    display.blit(image, rect)

    # check for quit
    for event in pygame.event.get():
        if event.type == pygame.locals.QUIT:


First thing you notice upon running this is that the image flies off to the side very quickly, as shown below:
This is because we need to recenter the image after each rotation. To do that we can add in this after line 21:

    rect.center = (150, 150)

and so now our animation looks like:
At which point a very egregious problem becomes clear: the image is destroying itself as it rotates.

Transforming from a base image

Basically what’s happening is that every transformation the image gets distorted a little bit, and continues moving farther and farther away from the original. To prevent this we’re going save the original and also track the total degrees to rotate the image. Then we’ll perform one rotation (with the minimal distortion caused by one transformation) and plot that every time step. To do this we’ll introduce a degrees variable to track the total rotations. The changes to the code start on line 12:

radians = 0

while 1:


    radians += .5

    rotated_image = pygame.transform.rotate(image, radians)
    rect = rotated_image.get_rect()
    rect.center = (150, 150)

And the resulting animation looks like:
Which is significantly better! Pretty good, in fact. Looking close, however, you can notice that it gets a little choppy. And this is because the pygame.transform.rotate method doesn’t use anti-aliasing to smooth out the image. pygame.transformation.rotozoom does, however! So we’ll use rotozoom and set the zoom level to 1, changing line 20:

    rotated = pygame.transform.rotozoom(image, degrees, 1)

And now we have a nice smooth animation!

At this point, we’re going to create a class to take care of this rotation business for us, storing the original image and providing a function that rotates smoothly and recenters the image.

import numpy as np
import pygame

class ArmPart:
    A class for storing relevant arm segment information.
    def __init__(self, pic, scale=1.0):
        self.base = pygame.image.load(pic)
        # some handy constants
        self.length = self.base.get_rect()[2]
        self.scale = self.length * scale
        self.offset = self.scale / 2.0

        self.rotation = 0.0 # in radians

    def rotate(self, rotation):
        Rotates and re-centers the arm segment.
        self.rotation += rotation 
        # rotate our image 
        image = pygame.transform.rotozoom(self.base, np.degrees(self.rotation), 1)
        # reset the center
        rect = image.get_rect()
        rect.center = (0, 0)

        return image, rect

Everything is just what we’ve seen so far, except the offset, which is going to be very useful for the trig we’re about to get into.


With rotations working and going smoothly one of the major hurdles is now behind us. At this point we can get our arm images and use some trig to make sure that they rotate around the joint ends rather than in the center. Using the above class now we’ll write a script that loads in a picture of an upper arm, and then rotates it around the shoulder.

import numpy as np
import pygame
import pygame.locals

from armpart import ArmPart

black = (0, 0, 0)
white = (255, 255, 255)


width = 500
height = 500
display = pygame.display.set_mode((width, height))
fpsClock = pygame.time.Clock()

upperarm = ArmPart('upperarm.png', scale=.7)

base = (width / 2, height / 2)

while 1:


    ua_image, ua_rect = upperarm.rotate(.01) 
    ua_rect.center += np.asarray(base)
    ua_rect.center -= np.array([np.cos(upperarm.rotation) * upperarm.offset,
                                -np.sin(upperarm.rotation) * upperarm.offset])

    display.blit(ua_image, ua_rect)
    pygame.draw.circle(display, black, base, 30)

    # check for quit
    for event in pygame.event.get():
        if event.type == pygame.locals.QUIT:


So far the only trig we need is simply calculating the offset from the center point. Here it’s calculated as half of the length of the image multiplied by a scaling term. The scaling term is because we don’t want the rotation point to be the absolute edge of the image but rather we want it to be somewhere inside the arm. We calculate the x and y offset values using the cos and sin functions, respectively, with a negative sign in from of the sin function because the y axis is inverted, using the top of the screen as 0. The black circle is just for ease of viewing while we’re figuring all the trig out, to make it easier to verify it’s rotating around the point we want. The above code results in the following

Going ahead and adding in the other links gives us

import numpy as np
import pygame
import pygame.locals

from armpart import ArmPart

black = (0, 0, 0)
white = (255, 255, 255)


width = 750
height = 750
display = pygame.display.set_mode((width, height))
fpsClock = pygame.time.Clock()

upperarm = ArmPart('upperarm.png', scale=.7)
forearm = ArmPart('forearm.png', scale=.8)
hand = ArmPart('hand.png', scale=1.0)

origin = (width / 2, height / 2)

while 1:


    ua_image, ua_rect = upperarm.rotate(.03) 
    fa_image, fa_rect = forearm.rotate(-.02) 
    h_image, h_rect = hand.rotate(.03) 

    # generate (x,y) positions of each of the joints
    joints_x = np.cumsum([0, 
                          upperarm.scale * np.cos(upperarm.rotation),
                          forearm.scale * np.cos(forearm.rotation),
                          hand.length * np.cos(hand.rotation)]) + origin[0]
    joints_y = np.cumsum([0, 
                          upperarm.scale * np.sin(upperarm.rotation),
                          forearm.scale * np.sin(forearm.rotation), 
                          hand.length * np.sin(hand.rotation)]) * -1 + origin[1]
    joints = [(int(x), int(y)) for x,y in zip(joints_x, joints_y)]

    def transform(rect, origin, arm_part):
        rect.center += np.asarray(origin)
        rect.center += np.array([np.cos(arm_part.rotation) * arm_part.offset,
                                -np.sin(arm_part.rotation) * arm_part.offset])

    transform(ua_rect, joints[0], upperarm)
    transform(fa_rect, joints[1], forearm)
    transform(h_rect, joints[2], hand)
    # transform the hand a bit more because it's weird
    h_rect.center += np.array([np.cos(hand.rotation), 
                              -np.sin(hand.rotation)]) * -10

    display.blit(ua_image, ua_rect)
    display.blit(fa_image, fa_rect)
    display.blit(h_image, h_rect)

    # check for quit
    for event in pygame.event.get():
        if event.type == pygame.locals.QUIT:


So basically the only thing we’ve done here that was a little more complicated was setting up the centers of the forearm and hand images to be at the end of the upper arm and forearm, respectively. We do that in the joints_x and joints_y variables, and the trig for that is straight from basic robotics.
The above code results in the following animation (which is a little choppy because I dropped some frames to keep the file size small):

Plotting transparent lines

The other kind of of tricky thing that was done in the visualization code was the plotting of transparent lines of the ‘skeleton’ of the arm. The reason that this was kind of tricky was because you can’t just use the pygame.draw.lines method, because it doesn’t allow for transparency. So what you end up having to do instead is generate a set of Surfaces of the shape that you want for each of the lines, and then transform and plot them appropriately, which is thankfully pretty straightforward after working things out for the images above.

To generate a transparent surface you use the pygame.SRCALPHA variable, so it looks like

surface = pygame.Surface((width, height), pygame.SRCALPHA, 32)
surface.fill((100, 100, 100, 200))

where the variable passed in to fill is a 4-tuple, with the first 3 parameters the standard RGB levels, and the fourth parameter being the transparency level.

Then there’s one more snag, in that when you blit a surface it goes by the top left position of the rectangle that contains it. So doing the same transformations for the images we then have to transform the surface further because with the images we were specifying the desired center point. This is easy enough; just get a handle to the rect for the surface and subtract out half of the width and height (post-rotation).

The code with the transparent lines (and some circles at the joints added in for prettiness) then is

import numpy as np
import pygame
import pygame.locals

from armpart import ArmPart

black = (0, 0, 0)
white = (255, 255, 255)
arm_color = (50, 50, 50, 200) # fourth value specifies transparency


width = 750
height = 750
display = pygame.display.set_mode((width, height))
fpsClock = pygame.time.Clock()

upperarm = ArmPart('upperarm.png', scale=.7)
forearm = ArmPart('forearm.png', scale=.8)
hand = ArmPart('hand.png', scale=1.0)

line_width = 15
line_upperarm = pygame.Surface((upperarm.scale, line_width), pygame.SRCALPHA, 32)
line_forearm = pygame.Surface((forearm.scale, line_width), pygame.SRCALPHA, 32)
line_hand = pygame.Surface((hand.scale, line_width), pygame.SRCALPHA, 32)


origin = (width / 2, height / 2)

while 1:


    # rotate our joints
    ua_image, ua_rect = upperarm.rotate(.03) 
    fa_image, fa_rect = forearm.rotate(-.02) 
    h_image, h_rect = hand.rotate(.03) 

    # generate (x,y) positions of each of the joints
    joints_x = np.cumsum([0, 
                          upperarm.scale * np.cos(upperarm.rotation),
                          forearm.scale * np.cos(forearm.rotation),
                          hand.length * np.cos(hand.rotation)]) + origin[0]
    joints_y = np.cumsum([0, 
                          upperarm.scale * np.sin(upperarm.rotation),
                          forearm.scale * np.sin(forearm.rotation), 
                          hand.length * np.sin(hand.rotation)]) * -1 + origin[1]
    joints = [(int(x), int(y)) for x,y in zip(joints_x, joints_y)]

    def transform(rect, base, arm_part):
        rect.center += np.asarray(base)
        rect.center += np.array([np.cos(arm_part.rotation) * arm_part.offset,
                                -np.sin(arm_part.rotation) * arm_part.offset])

    transform(ua_rect, joints[0], upperarm)
    transform(fa_rect, joints[1], forearm)
    transform(h_rect, joints[2], hand)
    # transform the hand a bit more because it's weird
    h_rect.center += np.array([np.cos(hand.rotation), 
                              -np.sin(hand.rotation)]) * -10

    display.blit(ua_image, ua_rect)
    display.blit(fa_image, fa_rect)
    display.blit(h_image, h_rect)

    # rotate arm lines
    line_ua = pygame.transform.rotozoom(line_upperarm, 
                                        np.degrees(upperarm.rotation), 1)
    line_fa = pygame.transform.rotozoom(line_forearm, 
                                        np.degrees(forearm.rotation), 1)
    line_h = pygame.transform.rotozoom(line_hand, 
                                        np.degrees(hand.rotation), 1)
    # translate arm lines
    lua_rect = line_ua.get_rect()
    transform(lua_rect, joints[0], upperarm)
    lua_rect.center += np.array([-lua_rect.width / 2.0, -lua_rect.height / 2.0])

    lfa_rect = line_fa.get_rect()
    transform(lfa_rect, joints[1], forearm)
    lfa_rect.center += np.array([-lfa_rect.width / 2.0, -lfa_rect.height / 2.0])

    lh_rect = line_h.get_rect()
    transform(lh_rect, joints[2], hand)
    lh_rect.center += np.array([-lh_rect.width / 2.0, -lh_rect.height / 2.0])

    display.blit(line_ua, lua_rect)
    display.blit(line_fa, lfa_rect)
    display.blit(line_h, lh_rect)

    # draw circles at joints for pretty
    pygame.draw.circle(display, black, joints[0], 30)
    pygame.draw.circle(display, arm_color, joints[0], 12)
    pygame.draw.circle(display, black, joints[1], 20)
    pygame.draw.circle(display, arm_color, joints[1], 7)
    pygame.draw.circle(display, black, joints[2], 15)
    pygame.draw.circle(display, arm_color, joints[2], 5)

    # check for quit
    for event in pygame.event.get():
        if event.type == pygame.locals.QUIT:


And the resulting arm visualization now looks like this!

From here you can then blit on an image in the background, and the little zoom in I have in the top left is just another surface, it’s all pretty straight forward. And that’s pretty much everything! Hopefully this helps someone trying to get some fancier PyGame things going, and if you have any means of generating fancier PyGame animations yourself please drop any tips below in the comments!

Again, the code from this post is all up on my GitHub.

Tagged , , ,

Creating animated gifs in Ubuntu

Something that I do for a lot of my posts is create animated gifs, and usually I forget the whole process and have to re-look up everything. So here I will consolidate the process. There are four steps: 1) record video; 2) convert to images; 3) trim down image set; 4) convert to gif.

First, I’ve had luck recording the original video using the SimpleScreenRecorder program. Installation instructions are on their homepage, it’s very simple. I had been using RecordMyDesktop and tried Kazam, but prefer SimpleScreenRecorder.
Once you have your video, create a folder to store all your images in.
I called mine ‘images’.

To convert the video to a set of images we’ll need mplayer. In linux to install this it’s just

sudo apt-get install mplayer2

Once you have mplayer installed, change directories into your images folder and run

mplayer -ao _ ../movie_name.mp4 -vo png:z=9

This will fill up the folder with a bunch of screenshots from the video.

At this point you can go ahead and convert these images into an animated gif, but I always trim down the set to reduce the gif size. This just entails me going through and deleting every other image until I get a file that’s less than a megabyte. Once you’ve got the image set that you want to convert, you’re going to need ImageMagick installed. If you don’t have it, just run

sudo apt-get install imagemagick

Once you have that you’re going to use the convert function:

convert -delay 10 -loop 0 -deconstruct -quantize transparent -layers optimize -resize 400x400 *.png animation.gif 

with a bunch of extra options attached to control the play speed (that’s the delay parameter), the looping, some optimization parameters, and then what size it comes out (I’ve chosen 400×400 pixels here).

And there you go! You can now create your own fancy gifs.

Tagged , , ,

Workshop talk – Methods for scaling neural computation

A couple of months ago I gave a talk at the Neuro-Inspired Computational Elements (NICE) workshop, about the use of cortical microcircuits for adaptation in the prediction and control problems. The talk was recorded, and here is a link: http://1.usa.gov/1tg2n2I Ignore the fire alarm that gets pulled in the first minute, the actual talk starts around 1 minute.

The were a lot of talks in general that were very interesting, which are also available online here: http://nice.sandia.gov/videos.html

I’ll be writing up some posts on the subject matter of my talk in the next couple months, explaining the methods in more detail and providing some solid examples with code. Hope you find the video interesting!

Tagged , , , ,

Dynamic movement primitives part 3: Rhythmic movements

So far we’ve looked at using DMPs for discrete movements, but as I briefly mentioned it’s also possible to use them to perform rhythmic movements. In this post we’re going to look at the implementation of rhythmic DMPs and see exactly how it’s done. It’s actually pretty straightforward, which is always nich. We’ll go through everything that has to change and then see a couple of different applications of rhythmic DMPs. Again, all the code for the DMPs and arm control implementations can be found up on my Github.

Obtaining consistent and repeatable basis function activation

We’ll start with the canonical system. In the discrete case our canonical system (which drives the activation of the basis functions) decayed from 1 to 0 and then we were all done. In the rhythmic case we want our system to repeat indefinitely, so we need a reliable way of continuously activating the basis functions in the same order. One function that may come to mind that reliably repeats is the cosine function. But how to use the cosine function, exactly?

The idea is that we’ll lay out our basis functions in the range from 0 to 2\pi, and we’ll set the canonical system to be ever increasing linearly. Then we’ll use the difference between the canonical system state and each of the center points as the value we pass in to our cosine function. Because the cosine function repeats at 2\pi we’ll get a nice repeating spread of basis function activations. We’ll then throw these values through our Gaussian equation (with the gain and bias terms) and we’ll be done! Here’s the equation:

\psi = \textrm{exp}(h * \textrm{cos}(x - c) - 1),
where x is the state of the canonical system, c is the basis function center point, and h is a gain term controlling the variance of the Gaussian. And here’s a picture of the activations of three basis functions with centers spread out evenly between 0 and 2\pi:


Here’s a picture of the basis function activations on a longer time scale:


As long as our canonical system increases at a steady pace the basis functions will continue to activate in a reliable and repeatable way. As mentioned, to get the canonical system to increase at a steady pace we simply give it linear dynamics:

\dot{x} = 1
The placement of the center points of the rhythmic system is a lot easier than in the discrete case because the rhythmic canonical system dynamics are linear.

Other differences between discrete and rhythmic

In the actual implementation of the rhythmic system there are a couple of other differences that you need to be aware of. The first is that there is no diminishing term in the rhythmic system, whereas there is in the discrete case.

The second is how to go about establishing goal states when imitating an path. We’re going to assume that whatever path is passed in to us is going to be a rhythmic pattern, and then goal state is going to be set as the center point of the desired trajectory. So for rhythmic systems the ‘goal’ is actually more like a center point for the trajectory.

And that’s pretty much it! If you look through the code that I have up for the rhythmic DMP subclass you’ll see that only a handful of functions needed to be redefined, the rest is the same! Well that’s great, let’s look at some applications.


I was thinking of different examples that would be interesting to see of the rhythmic DMPs, and after a little consideration the most obvious application of rhythmic DMPs is really to a system that does something like walking. In the spinal cord of animals there are circuits which are known as central pattern generators (CPGs), and when stimulated these have been shown to generate rhythmic, repeated movements (as described in Principles of rhythmic motor pattern generation and other papers). Let’s go ahead and build our own CPG here using DMPs. I don’t have a simulation of a set of legs, however, so instead we’ll look at getting a single 3-link arm to reproduce a pattern of behaviour that you’d expect in a leg that is walking.

To do this, I’m going to need to specify the trajectories for the three joints of the arm, and I’ll be controlling the system in joint space, using the generalized coordinates controller, rather than the operational space controller used in the discrete DMP examples for controlling the end-effector position. So that’s one thing. Another thing is that I’m going to need the kinematics of leg joints during motion. After a quick search of the web I found this presentation from Simon Fraser University which has the following images:


Which is some very useful information! I just need to transfer this into a Numpy vector that I can feed into the system and convert degrees to radians and we should be good to go! To get the information out of the graph, use your favorite data stripper, I used http://arohatgi.info/WebPlotDigitizer/app/.

I also had to do a bit of a shuffle with the trajectories provided to make it look right on the arm. The hip to shoulder joint maps fine, but for the knee to elbow the flexion and extension are reversed, and for the wrist to foot the 0 angle for the foot is straight out. Once those have been accounted for, though, everything works great. Here is an animation of a single leg walking along, if you use your imagination you can just picture a happy-go-lucky torso attached to this leg whistling while he gets his very human-like gait on:


And there you go! It’s neat how straightforward that was, once you have the data of the system you want to model. It shows the power of DMPs for coordinating multiple degrees of freedom quickly and easily. Also something that’s worth pointing out once again is that this is a force based controller. We were able to go from a kinematic description of the movement we wanted to a reliable system generating torques. It’s another good example of the dexterity of dynamic movement primitives.

You can see the code for everything here up online on my github control repo on the walking_demo branch. To run the code you should run:

python run.py

If you have any questions or trouble running the code please contact me.

Tagged , , ,

Dynamic movement primitives part 2: Controlling end-effector trajectories

The dynamic movement primitive (DMP) framework was designed for trajectory control. It so happens that in previous posts we’ve built up to having several arm simulations that are ripe for throwing a trajectory controller on top, and that’s what we’ll do in this post. The system that we will be controlling here is the 3 link arm model with an operational space controller (OSC) that translates end-effector forces into joint torques. The DMPs here will be controlling the (x,y) trajectory of the hand, and the OSC will take care of turning the desired hand forces into torques that can be applied to the arm. All of the code used to generate the animations throughout this post can of course be found up on my github.

Controlling a 3 link arm with DMPs
We have our 3 link arm and its OSC controller; this whole setup we’ll collectively refer to as ‘the plant’ throughout this post. We are going to pass in some (x,y) force signal to the plant, and the plant will carry it out. Additionally we’ll get a feedback signal with the (x,y) position of the hand. At the same time, we also have a DMP system that’s doing its own thing, tracing out a desired trajectory in (x,y) space. We have to tie these two systems together.

To do this is easy, we’ll generate the control signal for the plant from our DMP system simply by measuring the difference between the state of our DMP system and the plant state, use that to drive the plant to the state of the DMP system. Formally,

u = k_p(y_{\textrm{DMP}} - y)

where y_{\textrm{DMP}} is the state of the DMP system, y is the state of the plant, and k_p and is the position error gain term.

Once we have this, we just go ahead and step our DMP system forward and make sure the gain values on the control signal are high enough that the plant follows the DMP’s trajectory. And that’s pretty much it, just run the DMP system to the end of the trajectory and then stop your simulation.

To give a demonstration of DMP control I’ve set up the DMP system to follow the same number trajectories that the SPAUN arm followed. As you can see the combination of DMPs and operational space control is much more effective than my previous implementation.

Incorporating system feedback

One of the issues in implementing the control above is that we have to be careful about how quickly the DMP trajectory moves, because while the DMP system isn’t constrained by any physical dynamics, the plant is. Depending on the size of the movement the DMP trajectory may be moving a foot a second or an inch a second. You can see above that the arm doesn’t fully draw out the desired trajectories in places where the DMP system moved too quickly in and out and sharp corners. The only way to remedy this without feedback is to have the DMP system move more slowly throughout the entire trajectory. What would be nice, instead, would be to just say ‘go as fast as you can, as long as the plant state is within some threshold distance of you’, and this is where system feedback comes in.

It’s actually very straightforward to implement this using system feedback: If the plant state drifts away from the state of the DMPs, slow down the execution speed of the DMP to allow the plant time to catch up. The do this we just have to multiply the DMP timestep dt by a new term:

1 / (1 + \alpha_{\textrm{err}}(y_{\textrm{plant}} - y_{\textrm{DMP}})).

All this new term does is slow down the canonical system when there’s an error, you can think of it as a scaling on the time step. Additionally, the sensitivity of this term can be modulated the scaling term \alpha_{\textrm{err}} on the difference between the plant and DMP states.

We can get an idea of how this affects the system by looking at the dynamics of the canonical system when an error term is introduced mid-run:

When the error is introduced the dynamics of the system slow down, great! Lets look at an example comparing execution with and without this feedback term. Here’s the system drawing the number 3 without any feedback incorporation:

and here’s the system drawing the number 3 with the feedback term included:

These two examples are a pretty good case for including the feedback term into your DMP system. You can still see in the second case that the specified trajectory isn’t traced out exactly, but if that’s what you’re shooting for you can just crank up the \alpha_{\textrm{err}} to make the DMP timestep really slow down whenever the DMP gets ahead of the plant at all.

Interpolating trajectories for imitation

When imitating trajectories there can be some issues with having enough sample points and how to fit them to the canonical system’s timeframe, they’re not difficult to get around but I thought I would go over what I did here. The approach I took was to always run the canonical system for 1 second, and whenever a trajectory is passed in that should be imitated to scale the x-axis of the trajectory such that it’s between 0 and 1. Then I shove it into an interpolator and use the resulting function to generate an abundance of nicely spaced sample points for the DMP imitator to match. Here’s the code for that:

# generate function to interpolate the desired trajectory
import scipy.interpolate

path = np.zeros((self.dmps, self.timesteps))
x = np.linspace(0, self.cs.run_time, y_des.shape[1])

for d in range(self.dmps):

    # create interpolation function
    path_gen = scipy.interpolate.interp1d(x, y_des[d])

    # create evenly spaced sample points of desired trajectory
    for t in range(self.timesteps):
        path[d, t] = path_gen(t * self.dt)

y_des = path

Direct trajectory control vs DMP based control

Now, using the above described interpolation function we can just directly use its output to guide our system. And, in fact, when we do this we get very precise control of the end-effector, more precise than the DMP control, as it happens. The reason for this is because our DMP system is approximating the desired trajectory and with a set of basis functions, and some accuracy is being lost in this approximation.

So if we instead use the interpolation function to drive the plant we can get exactly the points that we specified. The times when this comes up especially are when the trajectories that you’re trying to imitate are especially complicated. There are ways to address this with DMPs by placing your basis functions more appropriately, but if you’re just looking for the exact replication of an input trajectory (as often people are) this is a simpler way to go. You can see the execution of this in the trace.py code up on my github. Here’s a comparison of a single word drawn using the interpolation function:


and here’s the same word drawn using a DMP system with 1,000 basis function per DOF:

We can see that just using the interpolation function here gives us the exact path that we specified, where using DMPs we have some error, and this error increases with the size of the desired trajectory. But! That’s for exactly following a given trajectory, which is often not the case. The strength of the DMP framework is that the trajectory is a dynamical system. This lets us do simple things to get really neat performance, like scale the trajectory spatially on the fly simply by changing the goal, rather than rescaling the entire trajectory:


Some basic examples of using DMPs to control the end-effector trajectory of an arm with operational space control were gone over here, and you can see that they work really nicely together. I like when things build like this. We also saw that power of DMPs in this situation is in their generalizability, and not in exact reproduction of a given path. If I have a single complex trajectory that I only want the end-effector to follow once then I’m going to be better off just interpolating that trajectory and feeding the coordinates into the arm controller rather than go through the whole process of setting up the DMPs.

Drawing words, though, is just one basic example of using the DMP framework. It’s a very simple application and really doesn’t do justice to the flexibility and power of DMPs. Other example applications include things like playing ping pong. This is done by creating a desired trajectory showing the robot how to swing a ping pong paddle, and then using a vision system to track the current location of the incoming ping pong ball and changing the target of the movement to compensate dynamically. There’s also some really awesome stuff with object avoidance, that is implemented by adding another term with some simple dynamics to the DMP. Discussed here, basically you just have another system that moves you away from the object with a strength relative to your distance from the object. You can also use DMPs to control gain terms on your PD control signal, which is useful for things like object manipulation.

And of course I haven’t touched on rhythmic DMPs or learning with DMPs at all, and those are both also really interesting topics! But this serves as a decent introduction to the whole area, which has been developed in the Schaal lab over the last decade or so. I recommend further reading with some of these papers if you’re interested, there are a ton of neat ways to apply the DMP framework! And, again, the code for everything here isĀ up on my github in the control and pydmps repositories.

Tagged , , , , ,