## Setting up an arm simulation interface in Nengo 2

I got an email the other day asking about how to set up an arm controller in Nengo, where they had been working from the Spaun code to strip away things until they had just the motor portion left they could play with. I ended up putting together a quick script to get them started and thought I would share it here in case anyone else was interested. It’s kind of fun because it shows off some of the new GUI and node interfacing. Note that you’ll need nengo_gui version .15+ for this to work. In general I recommend getting the dev version installed, as it’s stable and updates are made all the time improving functionality.

Nengo 1.4 core was all written in Java, with Jython and Python scripting thrown in on top, and since then a lot of work has gone into the re-write of the entire code base for Nengo 2. Nengo 2 is now written in Python, all the scripting is in Python, and we have a kickass GUI and support for running neural simulations on CPUs, GPUs, and specialized neuromorphic hardware like SpiNNaKer. I super recommend checking it out if you’re at all interested in neural modelling, we’ve got a bunch of tutorials up and a very active support board to help with any questions or problems. You can find the simulator code for installation here: https://github.com/nengo/nengo and the GUI code here: https://github.com/nengo/nengo_gui, where you can also find installation instructions.

And once you have that up and running, to run an arm simulation you can download and run the following code I have up on my GitHub. When you pop it open at the top is a run_in_GUI boolean, which you can use to open the sim up in the GUI, if you set it to False then it will run in the Nengo simulator and once finished will pop up with some basic graphs. Shout out to Terry Stewart for putting together the arm-visualization. It’s a pretty slick little demo of the extensibility of the Nengo GUI, you can see the code for it all in the <code>arm_func</code> in the <code>nengo_arm.py</code> file.

As it’s set up right now, it uses a 2-link arm, but you can simply swap out the Arm.py file with whatever plant you want to control. And as for the neural model, there isn’t one implemented in here, it’s just a simple input node that runs through a neural population to apply torque to the two joints of the arm. But! It should be a good start for anyone looking to experiment with arm control in Nengo. Here’s what it looks like when you pull it up in the GUI (also note that the arm visualization only appears once you hit the play button!):

## Operational space control of 6DOF robot arm with spiking cameras part 2: Deriving the Jacobian

In the previous exciting post in this series I outlined the project, which is in the title, and we worked through getting access to the arm through Python. The next step was deriving the Jacobian, and that’s what we’re going to be talking about in this post!

Alright.
This was a time I was very glad to have a previous post talking about generating transformation matrices, because deriving the Jacobian for a 6DOF arm in 3D space comes off as a little daunting when you’re used to 3DOF in 2D space, and I needed a reminder of the derivation process. The first step here was finding out which motors were what, so I went through and found out how each motor moved with something like the following code:

for ii in range(7):
target_angles = np.zeros(7, dtype='float32')
target_angles[ii] = np.pi / 4.0
rob.move(target_angles)
time.sleep(1)


and I found that the robot is setup in the figures below

this is me trying my hand at making things clearer using Inkscape, hopefully it’s worked. Displayed are the first 6 joints and their angles of rotation, $q_0$ through $q_5$. The 7th joint, $q_6$, opens and closes the gripper, so we’re safe to ignore it in deriving our Jacobian. The arm segment lengths $l_1, l_3,$ and $l_5$ are named based on the nearest joint angles (makes easier reading in the Jacobian derivation).

Find the transformation matrix from end-effector to origin

So first thing’s first, let’s find the transformation matrices. Our first joint, $q_0$, rotates around the $z$ axis, so the rotational part of our transformation matrix $^0_\textrm{org}\textbf{T}$ is

$^0_\textrm{org}\textbf{R} = \left[ \begin{array}{ccc} \textrm{cos}(q_0) & -\textrm{sin}(q_0) & 0 \\ \textrm{sin}(q_0) & \textrm{cos}(q_0) & 0 \\ 0 & 0 & 1 \end{array} \right],$

and $q_0$ and our origin frame of reference are on top of each other so we don’t need to account for translation, so our translation component of $^0_\textrm{org}\textbf{T}$ is

$^0_\textrm{org}\textbf{D} = \left[ \begin{array}{c} 0 \\ 0 \\ 0 \end{array} \right].$

Stacking these together to form our first transformation matrix we have

$^0_\textrm{org}\textbf{T} = \left[ \begin{array}{cc} ^0_\textrm{org}\textbf{R} & ^0_\textrm{org}\textbf{D} \\ 0 & 1 \end{array} \right] = \left[ \begin{array}{cccc} \textrm{cos}(q_0) & -\textrm{sin}(q_0) & 0 & 0\\ \textrm{sin}(q_0) & \textrm{cos}(q_0) & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array} \right] .$

So now we are able to convert a position in 3D space from to the reference frame of joint $q_0$ back to our origin frame of reference. Let’s keep going.

Joint $q_1$ rotates around the $x$ axis, and there is a translation along the arm segment $l_1$. Our transformation matrix looks like

$^1_0\textbf{T} = \left[ \begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & \textrm{cos}(q_1) & -\textrm{sin}(q_1) & l_1 \textrm{cos}(q_1) \\ 0 & \textrm{sin}(q_1) & \textrm{cos}(q_1) & l_1 \textrm{sin}(q_1) \\ 0 & 0 & 0 & 1 \end{array} \right] .$

Joint $q_2$ also rotates around the $x$ axis, but there is no translation from $q_2$ to $q_3$. So our transformation matrix looks like

$^2_1\textbf{T} = \left[ \begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & \textrm{cos}(q_2) & -\textrm{sin}(q_2) & 0 \\ 0 & \textrm{sin}(q_2) & \textrm{cos}(q_2) & 0 \\ 0 & 0 & 0 & 1 \end{array} \right] .$

The next transformation matrix is a little tricky, because you might be tempted to say that it’s rotating around the $z$ axis, but actually it’s rotating around the $y$ axis. This is determined by where $q_3$ is mounted relative to $q_2$. If it was mounted at 90 degrees from $q_2$ then it would be rotating around the $z$ axis, but it’s not. For translation, there’s a translation along the $y$ axis up to the next joint, so all in all the transformation matrix looks like:

$^3_2\textbf{T} = \left[ \begin{array}{cccc} \textrm{cos}(q_3) & 0 & -\textrm{sin}(q_3) & 0\\ 0 & 1 & 0 & l_3 \\ \textrm{sin}(q_3) & 0 & \textrm{cos}(q_3) & 0\\ 0 & 0 & 0 & 1 \end{array} \right] .$

And then the transformation matrices for coming from $q_4$ to $q_3$ and $q_5$ to $q_4$ are the same as the previous set, so we have

$^4_3\textbf{T} = \left[ \begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & \textrm{cos}(q_4) & -\textrm{sin}(q_4) & 0 \\ 0 & \textrm{sin}(q_4) & \textrm{cos}(q_4) & 0 \\ 0 & 0 & 0 & 1 \end{array} \right] .$

and

$^5_4\textbf{T} = \left[ \begin{array}{cccc} \textrm{cos}(q_5) & 0 & -\textrm{sin}(q_5) & 0 \\ 0 & 1 & 0 & l_5 \\ \textrm{sin}(q_5) & 0 & \textrm{cos}(q_5) & 0\\ 0 & 0 & 0 & 1 \end{array} \right] .$

Alright! Now that we have all of the transformation matrices, we can put them together to get the transformation from end-effector coordinates to our reference frame coordinates!

$^\textrm{ee}_\textrm{org}\textbf{T} = ^0_\textrm{org}\textbf{T} \; ^1_0\textbf{T} \; ^2_1\textbf{T} \; ^3_2\textbf{T} \; ^4_3\textbf{T} \; ^5_4\textbf{T}.$

At this point I went and tested this with some sample points to make sure that everything seemed to be being transformed properly, but we won’t go through that here.

Calculate the derivative of the transform with respect to each joint

The next step in calculating the Jacobian is getting the derivative of $^\textrm{ee}_\textrm{org}T$. This could be a big ol’ headache to do it by hand, OR we could use SymPy, the symbolic computation package for Python. Which is exactly what we’ll do. So after a quick

sudo pip install sympy


I wrote up the following script to perform the derivation for us

import sympy as sp

def calc_transform():
# set up our joint angle symbols (6th angle doesn't affect any kinematics)
q = [sp.Symbol('q0'), sp.Symbol('q1'), sp.Symbol('q2'), sp.Symbol('q3'),
sp.Symbol('q4'), sp.Symbol('q5')]
# set up our arm segment length symbols
l1 = sp.Symbol('l1')
l3 = sp.Symbol('l3')
l5 = sp.Symbol('l5')

Torg0 = sp.Matrix([[sp.cos(q[0]), -sp.sin(q[0]), 0, 0,],
[sp.sin(q[0]), sp.cos(q[0]), 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]])

T01 = sp.Matrix([[1, 0, 0, 0],
[0, sp.cos(q[1]), -sp.sin(q[1]), l1*sp.cos(q[1])],
[0, sp.sin(q[1]), sp.cos(q[1]), l1*sp.sin(q[1])],
[0, 0, 0, 1]])

T12 = sp.Matrix([[1, 0, 0, 0],
[0, sp.cos(q[2]), -sp.sin(q[2]), 0],
[0, sp.sin(q[2]), sp.cos(q[2]), 0],
[0, 0, 0, 1]])

T23 = sp.Matrix([[sp.cos(q[3]), 0, sp.sin(q[3]), 0],
[0, 1, 0, l3],
[-sp.sin(q[3]), 0, sp.cos(q[3]), 0],
[0, 0, 0, 1]])

T34 = sp.Matrix([[1, 0, 0, 0],
[0, sp.cos(q[4]), -sp.sin(q[4]), 0],
[0, sp.sin(q[4]), sp.cos(q[4]), 0],
[0, 0, 0, 1]])

T45 = sp.Matrix([[sp.cos(q[5]), 0, sp.sin(q[5]), 0],
[0, 1, 0, l5],
[-sp.sin(q[5]), 0, sp.cos(q[5]), 0],
[0, 0, 0, 1]])

T = Torg0 * T01 * T12 * T23 * T34 * T45

# position of the end-effector relative to joint axes 6 (right at the origin)
x = sp.Matrix([0,0,0,1])

Tx = T * x

for ii in range(6):
print q[ii]
print sp.simplify(Tx[0].diff(q[ii]))
print sp.simplify(Tx[1].diff(q[ii]))
print sp.simplify(Tx[2].diff(q[ii]))


And then consolidated the output using some variable shorthand to write a function that accepts in joint angles and generates the Jacobian:

def calc_jacobian(q):
J = np.zeros((3, 7))

c0 = np.cos(q[0])
s0 = np.sin(q[0])
c1 = np.cos(q[1])
s1 = np.sin(q[1])
c3 = np.cos(q[3])
s3 = np.sin(q[3])
c4 = np.cos(q[4])
s4 = np.sin(q[4])

c12 = np.cos(q[1] + q[2])
s12 = np.sin(q[1] + q[2])

l1 = self.l1
l3 = self.l3
l5 = self.l5

J[0,0] = -l1*c0*c1 - l3*c0*c12 - l5*((s0*s3 - s12*c0*c3)*s4 + c0*c4*c12)
J[1,0] = -l1*s0*c1 - l3*s0*c12 + l5*((s0*s12*c3 + s3*c0)*s4 - s0*c4*c12)
J[2,0] = 0

J[0,1] = (l1*s1 + l3*s12 + l5*(s4*c3*c12 + s12*c4))*s0
J[1,1] = -(l1*s1 + l3*s12 + l5*s4*c3*c12 + l5*s12*c4)*c0
J[2,1] = l1*c1 + l3*c12 - l5*(s4*s12*c3 - c4*c12)

J[0,2] = (l3*s12 + l5*(s4*c3*c12 + s12*c4))*s0
J[1,2] = -(l3*s12 + l5*s4*c3*c12 + l5*s12*c4)*c0
J[2,2] = l3*c12 - l5*(s4*s12*c3 - c4*c12)

J[0,3] = -l5*(s0*s3*s12 - c0*c3)*s4
J[1,3] = l5*(s0*c3 + s3*s12*c0)*s4
J[2,3] = -l5*s3*s4*c12

J[0,4] = l5*((s0*s12*c3 + s3*c0)*c4 + s0*s4*c12)
J[1,4] = l5*((s0*s3 - s12*c0*c3)*c4 - s4*c0*c12)
J[2,4] = -l5*(s4*s12 - c3*c4*c12)

return J


Alright! Now we have our Jacobian! Really the only time consuming part here was calculating our end-effector to origin transformation matrix, generating the Jacobian was super easy using SymPy once we had that.

Hack position control using the Jacobian

Great! So now that we have our Jacobian we’ll be able to translate forces that we want to apply to the end-effector into joint torques that we want to apply to the arm motors. Since we can’t control applied force to the motors though, and have to pass in desired angle positions, we’re going to do a hack approximation. Let’s first transform our forces from end-effector space into a set of joint angle torques:

$\textbf{u} = \textbf{J}^T \; \textbf{u}_\textbf{x}.$

To approximate the control then we’re simply going to take the current set of joint angles (which we know because it’s whatever angles we last told the system to move to) and add a scaled down version of $\textbf{u}$ to approximate applying torque that affects acceleration and then velocity.

$\textbf{q}_\textrm{des} = \textbf{q} + \alpha \; \textbf{u},$

where $\alpha$ is the gain term, I used .001 here because it was nice and slow, so no crazy commands that could break the servos would be sent out before I could react and hit the cancel button.

What we want to do then to implement operational space control here then is find the current $(x,y,z)$ position of the end-effector, calculate the difference between it and the target end-effector position, use that to generate the end-effector control signal $u_x$, get the Jacobian for the current state of the arm using the function above, find the set of joint torques to apply, approximate this control by generating a set of target joint angles to move to, and then repeat this whole loop until we’re within some threshold of the target position. Whew.

So, a lot of steps, but pretty straight forward to implement. The method I wrote to do it looks something like:

def move_to_xyz(self, xyz_d):
"""
np.array xyz_d: 3D target (x_d, y_d, z_d)
"""
count = 0
while (1):
count += 1
# get control signal in 3D space
xyz = self.calc_xyz()
delta_xyz = xyz_d - xyz
ux = self.kp * delta_xyz

# transform to joint space
J = self.calc_jacobian()
u = np.dot(J.T, ux)

# target joint angles are current + uq (scaled)
self.q[...] += u * .001
self.robot.move(np.asarray(self.q.copy(), 'float32'))

if np.sqrt(np.sum(delta_xyz**2)) < .1 or count > 1e4:
break


And that is it! We have successfully hacked together a system that can perform operational space control of a 6DOF robot arm. Here is a very choppy video of it moving around to some target points in a grid on a cube.

So, granted I had to drop a lot of frames from the video to bring it’s size down to something close to reasonable, but still you can see that it moves to target locations super fast!

Alright this is sweet, but we’re not done yet. We don’t want to have to tell the arm where to move ourselves. Instead we’d like the robot to perform target tracking for some target LED we’re moving around, because that’s way more fun and interactive. To do this, we’re going to use spiking cameras! So stay tuned, we’ll talk about what the hell spiking cameras are and how to use them for a super quick-to-setup and foolproof target tracking system in the next exciting post!

## Operational space control of 6DOF robot arm with spiking cameras part 1: Getting access to the arm through Python

From June 9th to the 19th we ran a summer school (brain camp) in our lab with people from all over to come and learn how to use our neural modeling software, Nengo, and then work on a project with others in the school and people from the lab. We had a bunch of fun hardware available, from neuromorphic hardware from the SpiNNaker group over at Cambridge to lego robots on omni-wheels to spiking cameras (i’ll discuss what a spiking camera is exactly in part 3 of this series) and little robot arms. There were a bunch of awesome projects, people built neural models capable of performing a simplified version of the Wisconsin card sorting task that they then got running on the SpiNNaker boards, a neural controller built to move a robot leech, a spiking neurons reinforcement-learning system implemented on SpiNNaker with spiking cameras to control the lego robot that learned to move towards green LEDs and avoid red LEDs, and a bunch of others. If you’re interested in participating in these kinds of projects and learning more about this I highly suggest you apply to the summer school for next year!

I took the summer school as a chance to break from other projects and hack together a project. The idea was to take the robot arm, implement an operational space controller (i.e. find the Jacobian), and then used spiking cameras to track an LED and have the robot arm learn how to move to the target, no matter where the cameras were placed, by learning the relationship between where the target is in camera-centric coordinates and arm-centric coordinates. This broke down into several steps: 1) Get access to the arm through Python, 2) derive the Jacobian to implement operational space control, 3) sample state space to get an approximation of the camera-centric to arm-centric function, 4) implement the control system to track the target LED.

Here’s a picture of the set-up during step 3, training.

So in the center is the 6DOF robot arm with a little LED attached to the head, and highlighted in orange circles are the two spiking cameras, expertly taped to the wall with office-grade scotch tape. You can also see the SpiNNaker board in the top left as a bonus, but I didn’t have enough time to involve it in this project.

I was originally going to write this all up as one post, because the first two parts are re-implementing previous posts, but even skimming through those steps it was getting long and I’m sure no one minds having a few different examples to look through for generating Cython code or deriving a Jacobian. So I’m going to break this into a few parts. Here in this post we’ll work through the first step (Cython) of our journey.

The arm we had was the VE026A Denso training robot, on loan from Dr. Bryan Tripp of the neuromorphic robotics lab at UW. Previously an interface had been built up by one of Dr. Tripp’s summer students, written in C. C is great and all but Python is much easier to work with, and the rest of the software I wanted to work you know what I’m done justifying it Python is just great so Python is what I wanted to use. The end.

So to get access to the arm in Python I used the great ol’ C++ wrapper hack described in a previous post. Looking at Murphy-the-summer-student’s C code there were basically three functions I needed access to:

// initialize threads, connect to robot
void start_robot(int *semid, int32_t *ctrlid, int32_t *robotid, float **angles)
// send a set of joint angles to the robot servos
void Robot_Execute_slvMove(int32_t robotid, float j1, float j2, float j3, float j4, float j5, float j6, float j7, float j8)
// kill threads, disconnect from robot
void stop_robot(int semid, int32_t ctrlid, int32_t robotid)


So I changed the extension of the file to ‘.cpp’ (I know, I know, I said this was a hack!), fixed some compiler errors that popped up, and then appended the following to the end of the file:

/* A class to contain all the information that needs to
be passed around between these functions, and can
encapsulate it and hide it from the Python interface.

Written by Travis DeWolf (June, 2015)
*/

class Robot {
/* Very simple class, just stores the variables
* needed for talking to the robot, and a gives access
* to the functions for moving it around */

int semid;
int32_t ctrlid;
int32_t robotid;
float* angles;

public:
Robot();
~Robot();
void connect();
void move(float* angles_d);
void disconnect();
};

Robot::Robot() { }

Robot::~Robot() { free(angles); }

/* Connect to the robot, get the ids and current joint angles */
/* char* usb_port: the name of the port the robot is connected to */
void Robot::connect()
{
start_robot(&amp;semid, &amp;ctrlid, &amp;robotid, &amp;angles);
printf("%i %i %i", robotid, ctrlid, semid);
}

/* Move the robot to the specified angles */
/* float* angles: the target joint angles */
void Robot::move(float* angles)
{
// convert from radians to degrees
Robot_Execute_slvMove(robotid,
angles[0] * 180.0 / 3.14,
angles[1] * 180.0 / 3.14,
angles[2] * 180.0 / 3.14,
angles[3] * 180.0 / 3.14,
angles[4] * 180.0 / 3.14,
angles[5] * 180.0 / 3.14,
angles[6] * 180.0 / 3.14,
angles[7] * 180.0 / 3.14);
}

/* Disconnect from the robot */
void Robot::disconnect()
{
stop_robot(semid, ctrlid, robotid);
}

int main()
{
Robot robot = Robot();
// connect to robot
robot.connect();

// move robot to some random target angles
float target_angles[7] = {0, np.pi / 2.0, 0.0, 0, 0, 0, 0};
robot.move(target_angles);

sleep(1);

// disconnect
robot.disconnect();

return 0;
}


So very simple class. Basically just wanted to create a set of functions to hide some of the unnecessary parameters from Python, do the conversion from radians to degrees (who works in degrees? honestly), and have a short little main function to test the creation of the class, and connection, movement, and disconnection of the robot. Like I said, there were a few compiler errors when switching from C to C++, but really that was the only thing that took any time on this part. A few casts and everything was gravy.

The next part was writing the Cython pyRobot.pyx file (I describe the steps involved in this in more detail in this post):

import numpy as np
cimport numpy as np

cdef extern from "bcap.cpp":
cdef cppclass Robot:
Robot()
void connect()
void move(float* angles)
void disconnect()

cdef class pyRobot:
cdef Robot* thisptr

def __cinit__(self):
self.thisptr = new Robot()

def __dealloc__(self):
del self.thisptr

def connect(self):
"""
Set up the connection to the robot, pass in a vector,
get back the current joint angles of the arm.
param np.ndarray angles: a vector to store current joint angles
"""
self.thisptr.connect()

def move(self, np.ndarray[float, mode="c"] angles):
"""
Step the simulation forward one timestep. Pass in target angles,
get back the current arm joint angles.
param np.ndarray angles: 7D target angle vector
"""
self.thisptr.move(&amp;angles[0])

def disconnect(self):
"""
Disconnect from the robot.
"""
self.thisptr.disconnect()


the setup.py file:

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

setup(
name = 'Demos',
ext_modules=[
Extension("test",
sources=["pyRobot.pyx"],
language="c++"),
],
cmdclass = {'build_ext': build_ext},

)


and then compiling!

run setup.py build_ext -i


With all of this working, a nice test.so file was created, and it was now possible to connect to the robot in Python with

import test
rob = test.pyRobot()
rob.connect()
target_angles = np.array([0, np.pi/2, 0, np.pi/4, 0, 0, 0, 0], dtype='float32')
rob.move(target_angles)
import time
time.sleep(1)
rob.disconnect()


In the above code we’re instantiating the pyRobot class, connecting to the robot, defining a set of target angles and telling the robot to move there, waiting for 1 second to give the robot time to actually move, and then disconnecting from the robot. Upon connection we have to pass in a set of joint angles for the servos, and so you see the robot arm jerk into position, and then move to the target set of joint angles, it looks something exactly like this:

Neat, phase 1 complete.

At the end of phase 1 we are able to connect to the robot arm through Python and send commands in terms of joint angles. But we don’t want to send commands in terms of joint angles, we want to just specify the end-effector position and have the robot work out the angles! I’ve implemented an inverse kinematics solver using constrained optimization before for a 3-link planar arm, but we’re not going to go that route. Find out what we’ll do by joining us next time! or by remembering what I said we’d do at the beginning of this post.

Tagged , , ,

## Reinforcement Learning part 4: Combining egocentric and allocentric

Alright, in my last post on RL we looked at egocentric learning and what that is. We also saw that both ego and allocentric learning have their strong points, and they compliment each other. So, some combination of these two approaches seems like a good idea. In this post we’re going to look at combining the output from these two kinds of learners and see the strengths of different weightings. Per usual, all the code from this post is up on my Github.

Combining allocentric and egocentric approaches

There are a few different ways that come to mind that we could combine these two learning styles. In each of these cases the overall setup is that we’re going to run two learning systems, one that is allocentric and one that is egocentric, combine their action value mappings for the current state in some way, and then decide which action to take based on their combined output. So here are several possible ways to combine the outputs:

• Only use allocentric
• Only use egocentric
• Average the mappings
• Dynamically change weights

There are, of course, many more ways to weight these systems, but these are some basic ones and they’re what we’re going to look at here.

For testing each of these these different setups we are going to run the mouse on a map for a set amount of time, measure the wins, accidental deaths, and intentional deaths, and then gather statistics (mean value and confidence intervals) across 10 trials.

A side note: The rate of exploration (epsilon value) starts out at .1 and is decreased by half every 100,000 simulation time step, so there are a number of jumps where you see the number of wins go up every 100,000, and that’s an artifact of the rate of exploration dropping (i.e. increasing your exploitation of what you already know about the world).

OK, we’re going to do this across 3 different maps, where the size and complexity increases with each map.

Map 1: the basic cliff map (which should look familiar):

Map 2: a middlesized map with several areas that are identical for the egocentric state:

and Map 3: a pretty large map with even more ambiguous areas:

Expectations

Now, what we expect from our learning systems is that the egocentric learner with have very few ‘suicide’ deaths, where it’s slow to learn that jumping off a cliff is a terrible idea. But on the flip side of that, since we’re going to set the gamma value of the egocentric learner to 0, (which means that it doesn’t incorporate anything other than the immediate reward into it’s learning, as discussed in the previous post), we expect that it will only rarely find the goal. Also, as the map gets larger, we expect that the goal will be found less and less, as it will have to randomly stumble around avoiding cliffs until it ends up in sight of the cheese.

From the allocentric learner we expect the death rate to be much higher than the egocentric learner, but we also expect the success rate to be significantly higher, as it will be able to identify how to move based on its current (x,y) position, rather than based on its immediate surroundings. As the map gets larger though, we expect the number of deaths to be larger, since there is no transfer of knowledge that jumping off a cliff in this position is as bad as jumping off a cliff in that position.

Alright! Let’s get to it!

Results

Only use allocentric

Not much to explain about this, the mouse is going to be learning Q-values where each state is its (x,y) position on the map. The bigger the map, the more deaths we expect this learner to have. So, here are the results:

Alright, so one of the most salient things in these plots is the stepping up of the number of wins at each of the 100,000 marks. Again, this is due to a transfer from exploration to exploitation of what’s already been learned as the epsilon value is decreased. And, also, as the map gets bigger, the number of wins takes longer to max out, but it’s pretty interesting how consistent the learning is (as seen by the very small confidence intervals on the runs) across trials given the random exploration.
Another point of note is that the number of deaths goes way up with the size of the map, as expected.

Only use egocentric

Again, this one is another straightforward case, the mouse is going to be learning Q-values where each state is its set of immediate neighbors (e.g. cliff to the right, cheese in front, empty cells left and behind). Here we expect that it will have a minimal number of deaths (not much more than a death for each of the situations it can encounter a cliff or cliffs in), and that the number of successes should be pretty low and very inconsistent, since it will be entirely based on randomly stumbling into a spot near the cheese while avoiding cliffs. Additionally, we expect the number of results to drop with increases in the size of the map. The results are below:

The results are as expected, lots of variation in the wins number, low suicide numbers, and the number of wins decreases with map sizes. Something that’s kind of interesting here is that you don’t see any real effects of the epsilon value as it decreases as you do in the egocentric case. I believe that this is because it achieves a balance of exploration that throws you off a cliff and exploration that moves you into the vicinity of the cheese. So it’s dying less when the exploration rate drops, but it also has less chance of reaching the cheese now. That’s my guess, at least. Alright! Next.

Average the allo and ego Q-values

Here we’re going to do the most basic combination of the allo and ego-centric learning possible. Both learners will run simultaneously, and generate a set of Q-values over the set of possible actions at each state, and we’ll average the output of both then choose the max Q-value to choose which action is taken. The hope here is that the ego-centric learner will very quickly learn not to jump off cliffs, and the allocentric learner will more slowly learn how to move in the map to get cheese. So we expect few deaths and more wins. Let’s look at the results:

Damn! So now we have the same (or at least very similar) win rate as the allocentric learner (awesome), and the same death rate as the egocentric learner (also awesome)! We have achieved some sort of super-human…or, super-mouse mouse. A super mouse. This is really cool, it tells us that these two systems really do compliment each other, and that it can be incredibly straightforward to implement a combination of the two.

Dynamic weighting

Here we’ll look at the results from a variation of an algorithm I’ve implemented based on this really interesting paper by Dr. Sakya Dasgupta et al where they got a lot of really neat results implemented on a little robot critter. We can’t quite use the algorithm as presented here for a couple of reasons.

In the paper, the authors design the algorithm for weighting the output from classical conditioning and operant conditioning systems. I’m taking these to be pretty close to analogous to an ego-centric learner (with no lookahead) and an allo-centric learner, respectively. The idea being that classical conditioning is based on immediate associations developed from rewards, and operant conditioning allows the longer-term associations to be formed. Specifically, in operant you learn to associate the conditioned stimulus with the unconditioned stimulus, which leads to a a conditioned response. The classic example being the dog salivating when hearing the bell, because it knows food is coming. Here the allocentric learner is doing a similar thing, using a look-ahead to start associating stimuli (in this case different states) with a reward or punishment. So, this could be a wildly inappropriate application of this algorithm, and the comparison definitely warrants further discussion.

The more immediate reason for varying the algorithm is because the output of their system is a continuous ‘left/right’ decision, and the output from the mouse here is a discrete ‘left/right/up/down’ decision. So the systems in the paper output a number from -1 to 1 that indicates which way to turn, and this is used in the update. So the algorithm needs to be changed up a little.

First, here’s the original algorithm:

weight_allo += eta * reward * [output_allo - filtered_output_allo] * output_ego
weight_ego += eta * reward * [output_ego - filtered_output_ego] * output_allo

where eta is the learning rate, reward is the immediate reward of the state you move into, output_ is the -1 to 1 value from the learners, and filtered_output_ is a low pass filtered version of the output. So the first part is standard, learning rate multiplied by the reward, and then the part inside the brackets is calculating a derivative of the output of the learners. This is again standard, did your learner change? If it did change and a reward was received, then increase the weight of that learner. The last term, which multiplies it all by the output from the other learner is for scaling.

And then normalize the weights relative to each other, and it isn’t explicitly stated in the paper, but also lower bound the weights at 0. The reason you need to do this lower bounding is that if you don’t, then when you use the weight as a gain on the learner output all of the Q-values are inverted! Not easy to recover from that.

The dynamic weighting equations that I’m using are:

weight_allo += eta * reward * dq_allo * output_ego
weight_ego += eta * reward * dq_ego * output_allo

and then lower bound at 0 and normalize.

There is still a lot in these equations that I’m unsure about (for example scaling by the output of the other learner instead of by the other weight, and how the signs of the reward, derivative, and other learner output all interact), but I’m going to leave delving into that for a future post, since this one is getting long already and I don’t have a full understanding of it yet. But! I have enough of an understanding to implement the above and run simulations! Here are the results:

I should note that the learning rate here for the weight equations is 5e-4. When you decrease this weight you get improved performance, but the best I’ve gotten is just reducing it to a point that it’s pretty much constantly using .5 for each. And that’s boring.

Alright, comparing to basic averaging here we notice that we have about the same performance for the small map, and then on the bigger maps significantly worse performance and also a lot more variability between runs. Basically what seems to happen is that the system quickly identifies that the allocentric learner should not have control at the beginning, because it keeps killing the mouse, and then it takes a lot longer to start handing control back over from the egocentric learner to the allocentric learner.

Conclusions

Alright, so that’s all! Bottom line is that allocentric and egocentric learning really work well together, and basic averaging of these gives great results. Whether this is the best means of combining them is unclear, however. This was just a really basic test of different combinations, and there is some really neat work looking at different ways to do it. It’s super fun to mess around with and there’s a ton of room for experimentation. In the code up on my Github, the code for weight learning can be found by searching for weight_learning. If anyone plays around and figures out a kick-ass dynamic weighting algorithm please pass it along!

## Arm visualization with PyGame

So, in my neverending quest for better arm visualizations so I can make prettier pictures / videos and improve my submission acceptance rates I’ve started looking at PyGame. I started out hoping to find something that was quick and easy in Python for animating using SVG images, and PyGame is as close as I got. All in all, I’m decently happy with the result. It runs slower than the Matplotlib animation, and you can’t zoom in like you can on a Matplotlib graph, but it’s prettier. So, tradeoffs!

There were a few things that I ran into that tripped me up a bit when I was doing this implementation, so I thought that I would share, and that’s what this post is going to be about. Below is an animation showing what the arm visualization looks like, and as always the code for everything can be found on my Github. That links you to the control folder, and you can find all of the code developed / used for this post in this folder up on my GitHub.

Alright, there were a few parts in doing this that were a bit tricky. Let’s go through them one at a time.

Rotations in PyGame

Turns out rotating images is a pain right off the bat, but once you get over the initial hurdles it looks pretty slick.

To start we’re just going to attempt to rotate an image in place, here’s a first pass:

import pygame
import pygame.locals

white = (255, 255, 255)

pygame.init()

display = pygame.display.set_mode((300, 300))
fpsClock = pygame.time.Clock()

while 1:

display.fill(white)

image = pygame.transform.rotate(image, 1)
rect = image.get_rect()

display.blit(image, rect)

# check for quit
for event in pygame.event.get():
if event.type == pygame.locals.QUIT:
pygame.quit()
sys.exit()

pygame.display.update()
fpsClock.tick(30)


First thing you notice upon running this is that the image flies off to the side very quickly, as shown below:

This is because we need to recenter the image after each rotation. To do that we can add in this after line 21:

    rect.center = (150, 150)


and so now our animation looks like:

At which point a very egregious problem becomes clear: the image is destroying itself as it rotates.

Transforming from a base image

Basically what’s happening is that every transformation the image gets distorted a little bit, and continues moving farther and farther away from the original. To prevent this we’re going save the original and also track the total degrees to rotate the image. Then we’ll perform one rotation (with the minimal distortion caused by one transformation) and plot that every time step. To do this we’ll introduce a degrees variable to track the total rotations. The changes to the code start on line 12:

radians = 0

while 1:

display.fill(white)

rect = rotated_image.get_rect()
rect.center = (150, 150)


And the resulting animation looks like:

Which is significantly better! Pretty good, in fact. Looking close, however, you can notice that it gets a little choppy. And this is because the pygame.transform.rotate method doesn’t use anti-aliasing to smooth out the image. pygame.transformation.rotozoom does, however! So we’ll use rotozoom and set the zoom level to 1, changing line 20:

    rotated = pygame.transform.rotozoom(image, degrees, 1)


And now we have a nice smooth animation!

At this point, we’re going to create a class to take care of this rotation business for us, storing the original image and providing a function that rotates smoothly and recenters the image.

import numpy as np
import pygame

class ArmPart:
"""
A class for storing relevant arm segment information.
"""
def __init__(self, pic, scale=1.0):
# some handy constants
self.length = self.base.get_rect()[2]
self.scale = self.length * scale
self.offset = self.scale / 2.0

self.rotation = 0.0 # in radians

def rotate(self, rotation):
"""
Rotates and re-centers the arm segment.
"""
self.rotation += rotation
# rotate our image
image = pygame.transform.rotozoom(self.base, np.degrees(self.rotation), 1)
# reset the center
rect = image.get_rect()
rect.center = (0, 0)

return image, rect


Everything is just what we’ve seen so far, except the offset, which is going to be very useful for the trig we’re about to get into.

Trig

With rotations working and going smoothly one of the major hurdles is now behind us. At this point we can get our arm images and use some trig to make sure that they rotate around the joint ends rather than in the center. Using the above class now we’ll write a script that loads in a picture of an upper arm, and then rotates it around the shoulder.

import numpy as np
import pygame
import pygame.locals

from armpart import ArmPart

black = (0, 0, 0)
white = (255, 255, 255)

pygame.init()

width = 500
height = 500
display = pygame.display.set_mode((width, height))
fpsClock = pygame.time.Clock()

upperarm = ArmPart('upperarm.png', scale=.7)

base = (width / 2, height / 2)

while 1:

display.fill(white)

ua_image, ua_rect = upperarm.rotate(.01)
ua_rect.center += np.asarray(base)
ua_rect.center -= np.array([np.cos(upperarm.rotation) * upperarm.offset,
-np.sin(upperarm.rotation) * upperarm.offset])

display.blit(ua_image, ua_rect)
pygame.draw.circle(display, black, base, 30)

# check for quit
for event in pygame.event.get():
if event.type == pygame.locals.QUIT:
pygame.quit()
sys.exit()

pygame.display.update()
fpsClock.tick(30)


So far the only trig we need is simply calculating the offset from the center point. Here it’s calculated as half of the length of the image multiplied by a scaling term. The scaling term is because we don’t want the rotation point to be the absolute edge of the image but rather we want it to be somewhere inside the arm. We calculate the x and y offset values using the cos and sin functions, respectively, with a negative sign in from of the sin function because the y axis is inverted, using the top of the screen as 0. The black circle is just for ease of viewing while we’re figuring all the trig out, to make it easier to verify it’s rotating around the point we want. The above code results in the following

import numpy as np
import pygame
import pygame.locals

from armpart import ArmPart

black = (0, 0, 0)
white = (255, 255, 255)

pygame.init()

width = 750
height = 750
display = pygame.display.set_mode((width, height))
fpsClock = pygame.time.Clock()

upperarm = ArmPart('upperarm.png', scale=.7)
forearm = ArmPart('forearm.png', scale=.8)
hand = ArmPart('hand.png', scale=1.0)

origin = (width / 2, height / 2)

while 1:

display.fill(white)

ua_image, ua_rect = upperarm.rotate(.03)
fa_image, fa_rect = forearm.rotate(-.02)
h_image, h_rect = hand.rotate(.03)

# generate (x,y) positions of each of the joints
joints_x = np.cumsum([0,
upperarm.scale * np.cos(upperarm.rotation),
forearm.scale * np.cos(forearm.rotation),
hand.length * np.cos(hand.rotation)]) + origin[0]
joints_y = np.cumsum([0,
upperarm.scale * np.sin(upperarm.rotation),
forearm.scale * np.sin(forearm.rotation),
hand.length * np.sin(hand.rotation)]) * -1 + origin[1]
joints = [(int(x), int(y)) for x,y in zip(joints_x, joints_y)]

def transform(rect, origin, arm_part):
rect.center += np.asarray(origin)
rect.center += np.array([np.cos(arm_part.rotation) * arm_part.offset,
-np.sin(arm_part.rotation) * arm_part.offset])

transform(ua_rect, joints[0], upperarm)
transform(fa_rect, joints[1], forearm)
transform(h_rect, joints[2], hand)
# transform the hand a bit more because it's weird
h_rect.center += np.array([np.cos(hand.rotation),
-np.sin(hand.rotation)]) * -10

display.blit(ua_image, ua_rect)
display.blit(fa_image, fa_rect)
display.blit(h_image, h_rect)

# check for quit
for event in pygame.event.get():
if event.type == pygame.locals.QUIT:
pygame.quit()
sys.exit()

pygame.display.update()
fpsClock.tick(30)


So basically the only thing we’ve done here that was a little more complicated was setting up the centers of the forearm and hand images to be at the end of the upper arm and forearm, respectively. We do that in the joints_x and joints_y variables, and the trig for that is straight from basic robotics.
The above code results in the following animation (which is a little choppy because I dropped some frames to keep the file size small):

Plotting transparent lines

The other kind of of tricky thing that was done in the visualization code was the plotting of transparent lines of the ‘skeleton’ of the arm. The reason that this was kind of tricky was because you can’t just use the pygame.draw.lines method, because it doesn’t allow for transparency. So what you end up having to do instead is generate a set of Surfaces of the shape that you want for each of the lines, and then transform and plot them appropriately, which is thankfully pretty straightforward after working things out for the images above.

To generate a transparent surface you use the pygame.SRCALPHA variable, so it looks like

surface = pygame.Surface((width, height), pygame.SRCALPHA, 32)
surface.fill((100, 100, 100, 200))


where the variable passed in to fill is a 4-tuple, with the first 3 parameters the standard RGB levels, and the fourth parameter being the transparency level.

Then there’s one more snag, in that when you blit a surface it goes by the top left position of the rectangle that contains it. So doing the same transformations for the images we then have to transform the surface further because with the images we were specifying the desired center point. This is easy enough; just get a handle to the rect for the surface and subtract out half of the width and height (post-rotation).

The code with the transparent lines (and some circles at the joints added in for prettiness) then is

import numpy as np
import pygame
import pygame.locals

from armpart import ArmPart

black = (0, 0, 0)
white = (255, 255, 255)
arm_color = (50, 50, 50, 200) # fourth value specifies transparency

pygame.init()

width = 750
height = 750
display = pygame.display.set_mode((width, height))
fpsClock = pygame.time.Clock()

upperarm = ArmPart('upperarm.png', scale=.7)
forearm = ArmPart('forearm.png', scale=.8)
hand = ArmPart('hand.png', scale=1.0)

line_width = 15
line_upperarm = pygame.Surface((upperarm.scale, line_width), pygame.SRCALPHA, 32)
line_forearm = pygame.Surface((forearm.scale, line_width), pygame.SRCALPHA, 32)
line_hand = pygame.Surface((hand.scale, line_width), pygame.SRCALPHA, 32)

line_upperarm.fill(arm_color)
line_forearm.fill(arm_color)
line_hand.fill(arm_color)

origin = (width / 2, height / 2)

while 1:

display.fill(white)

# rotate our joints
ua_image, ua_rect = upperarm.rotate(.03)
fa_image, fa_rect = forearm.rotate(-.02)
h_image, h_rect = hand.rotate(.03)

# generate (x,y) positions of each of the joints
joints_x = np.cumsum([0,
upperarm.scale * np.cos(upperarm.rotation),
forearm.scale * np.cos(forearm.rotation),
hand.length * np.cos(hand.rotation)]) + origin[0]
joints_y = np.cumsum([0,
upperarm.scale * np.sin(upperarm.rotation),
forearm.scale * np.sin(forearm.rotation),
hand.length * np.sin(hand.rotation)]) * -1 + origin[1]
joints = [(int(x), int(y)) for x,y in zip(joints_x, joints_y)]

def transform(rect, base, arm_part):
rect.center += np.asarray(base)
rect.center += np.array([np.cos(arm_part.rotation) * arm_part.offset,
-np.sin(arm_part.rotation) * arm_part.offset])

transform(ua_rect, joints[0], upperarm)
transform(fa_rect, joints[1], forearm)
transform(h_rect, joints[2], hand)
# transform the hand a bit more because it's weird
h_rect.center += np.array([np.cos(hand.rotation),
-np.sin(hand.rotation)]) * -10

display.blit(ua_image, ua_rect)
display.blit(fa_image, fa_rect)
display.blit(h_image, h_rect)

# rotate arm lines
line_ua = pygame.transform.rotozoom(line_upperarm,
np.degrees(upperarm.rotation), 1)
line_fa = pygame.transform.rotozoom(line_forearm,
np.degrees(forearm.rotation), 1)
line_h = pygame.transform.rotozoom(line_hand,
np.degrees(hand.rotation), 1)
# translate arm lines
lua_rect = line_ua.get_rect()
transform(lua_rect, joints[0], upperarm)
lua_rect.center += np.array([-lua_rect.width / 2.0, -lua_rect.height / 2.0])

lfa_rect = line_fa.get_rect()
transform(lfa_rect, joints[1], forearm)
lfa_rect.center += np.array([-lfa_rect.width / 2.0, -lfa_rect.height / 2.0])

lh_rect = line_h.get_rect()
transform(lh_rect, joints[2], hand)
lh_rect.center += np.array([-lh_rect.width / 2.0, -lh_rect.height / 2.0])

display.blit(line_ua, lua_rect)
display.blit(line_fa, lfa_rect)
display.blit(line_h, lh_rect)

# draw circles at joints for pretty
pygame.draw.circle(display, black, joints[0], 30)
pygame.draw.circle(display, arm_color, joints[0], 12)
pygame.draw.circle(display, black, joints[1], 20)
pygame.draw.circle(display, arm_color, joints[1], 7)
pygame.draw.circle(display, black, joints[2], 15)
pygame.draw.circle(display, arm_color, joints[2], 5)

# check for quit
for event in pygame.event.get():
if event.type == pygame.locals.QUIT:
pygame.quit()
sys.exit()

pygame.display.update()
fpsClock.tick(30)


And the resulting arm visualization now looks like this!

From here you can then blit on an image in the background, and the little zoom in I have in the top left is just another surface, it’s all pretty straight forward. And that’s pretty much everything! Hopefully this helps someone trying to get some fancier PyGame things going, and if you have any means of generating fancier PyGame animations yourself please drop any tips below in the comments!

Again, the code from this post is all up on my GitHub.

## Robot control part 7: OSC of a 3-link arm

So we’ve done control for the 2-link arm, and control of the one link arm is trivial (where we control joint angle, or x or y coordinate of the pendulum), so here I’ll just show an implementation of operation space control for a more interesting arm model, the 3-link arm model. The code can all be found up on my Github.

In theory there’s nothing different or more difficult about controlling a 3-link arm versus a 2-link arm. For the inertia matrix, what I ended up doing here is just jacking up all the values of the matrix to about 100, which causes the controller to way over control the arm, and you can see the torques are much larger than they would need to be if we had an accurate inertia matrix. But the result is the same super straight trajectories that we’ve come to expect from operational space control:

It’s a little choppy because I cut out a bunch of frames to keep the gif size down. But you get the point, it works. And quite well!

Because this is also a 3-link arm now and our task space force signal is 2D, we have redundant space of solutions, meaning that the task space control signal can be carried out in a number of ways. In other words, a null space exists for this controller. This means that we can throw another controller in our system to operate inside the null space of the first controller. We’ve already worked through all the math for this, so it’s straightforward to implement.

What kind of null space controller should we put in? Well, you may have noticed in the above animation the arm goes through itself, here’s another example:

Often it’s desirable to avoid this (because of physics or whatever), so what we can do is add a secondary controller that works to keep the arm’s elbow and wrist near some comfortable default angles. When we do this we get the following:

And there you have it! Operational space control of a three link arm with a secondary controller in the null space to try to keep the angles near their default / resting positions.

I also added mouse based control to the arm so it will try to follow your mouse when you move over the figure, which makes it pretty fun to explore the power of the controller. It’s interesting to see where the singularities become an issue, and how having a null space controller that’s operating in joint space can actually come to help the system move through those problem points more smoothly. Check it out! It’s all up on my Github.

## Arm simulation visualization with Matplotlib

One of the downsides of switching to Python from Matlab is that it can be a pain to plot some kinds of things, and I’ve found animations to be one those things. In previous posts I’ve done the visualization of my arm simulations through Pyglet, but I recently started playing around with Matplotlib’s animation function, and the results are pretty smooth. The process is also relatively painless and quick to get up and running, so I thought I would throw up some Matplotlib code for visualizing my previously described 2 link arm MapleSim simulation.

So, let’s look at the code:

#Written by Travis DeWolf (Sept, 2013)
#Based on code by Jake Vanderplas - http://jakevdp.github.com

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation

"""
:param list u: the torque applied to each joints
"""
def __init__(self, u = [.1, 0]):
self.u = np.asarray(u, dtype='float') # control signal
self.state = np.zeros(3) # vector for current state
self.L1=0.37 # length of arm link 1 in m
self.L2=0.27 # length of arm link 2 in m
self.time_elapsed = 0

self.sim.reset(self.state)

def position(self):
"""Compute x,y position of the hand"""

x = np.cumsum([0,
self.L1 * np.cos(self.state[1]),
self.L2 * np.cos(self.state[2])])
y = np.cumsum([0,
self.L1 * np.sin(self.state[1]),
self.L2 * np.sin(self.state[2])])
return (x, y)

def step(self, dt):
"""Simulate the system and update the state"""
for i in range(1500):
self.sim.step(self.state, self.u)
self.time_elapsed = self.state[0]

#------------------------------------------------------------
# set up initial state and global variables
dt = 1./30 # 30 fps

#------------------------------------------------------------
# set up figure and animation
fig = plt.figure(figsize=(4,4))
xlim=(-1, 1), ylim=(-1, 1))
ax.grid()

line, = ax.plot([], [], 'o-', lw=4, mew=5)
time_text = ax.text(0.02, 0.95, '', transform=ax.transAxes)

def init():
"""initialize animation"""
line.set_data([], [])
time_text.set_text('')
return line, time_text

def animate(i):
"""perform animation step"""
global arm, dt
arm.step(dt)

line.set_data(*arm.position())
time_text.set_text('time = %.2f' % arm.time_elapsed)
return line, time_text

# frames=None for matplotlib 1.3
ani = animation.FuncAnimation(fig, animate, frames=None,
interval=25, blit=True,
init_func=init)

# uncomment the following line to save the video in mp4 format.
# requires either mencoder or ffmpeg to be installed
#         extra_args=['-vcodec', 'libx264'])

plt.show()


There’s not all too much to it, which is nice. I’ve created a class TwoLinkArm that wraps the actual arm simulator, stores the current state of the arm, and has a step function that gets called to move the system forward in time. Then, I created a line and stored a reference to it; this is what is going to be updated in the simulation. I then need functions that specify how the line will be initialized and updated. The init function doesn’t do anything except set the line and text data to nothing, and then the animate function calls the arm simulation’s step function and updates the line and text data.

For more details about it there’s this blog post which steps through the process a bit more. For simple arm simulations the above is all I need though, so I’ll leave it there for now!

Here’s an animation of the resulting sim visualization, I’ve removed a lot of the frames to keep the size down. It’s smoother when running the actual code, which can be found up on my github.

Hurray better visualization!

Tagged , , ,

## Reinforcement Learning part 2: SARSA vs Q-learning

In my previous post about reinforcement learning I talked about Q-learning, and how that works in the context of a cat vs mouse game. I mentioned in this post that there are a number of other methods of reinforcement learning aside from Q-learning, and today I’ll talk about another one of them: SARSA. All the code used is from Terry Stewart’s RL code repository, and can be found both there and in a minimalist version on my own github: SARSA vs Qlearn cliff. To run the code, simply execute the cliff_Q or the cliff_S files.

SARSA stands for State-Action-Reward-State-Action. In SARSA, the agent starts in state 1, performs action 1, and gets a reward (reward 1). Now, it’s in state 2 and performs another action (action 2) and gets the reward from this state (reward 2) before it goes back and updates the value of action 1 performed in state 1. In contrast, in Q-learning the agent starts in state 1, performs action 1 and gets a reward (reward 1), and then looks and sees what the maximum possible reward for an action is in state 2, and uses that to update the action value of performing action 1 in state 1. So the difference is in the way the future reward is found. In Q-learning it’s simply the highest possible action that can be taken from state 2, and in SARSA it’s the value of the actual action that was taken.

This means that SARSA takes into account the control policy by which the agent is moving, and incorporates that into its update of action values, where Q-learning simply assumes that an optimal policy is being followed. This difference can be a little difficult conceptually to tease out at first but with an example will hopefully become clear.

Mouse vs cliff

Let’s look at a simple scenario, a mouse is trying to get to a piece of cheese. Additionally, there is a cliff in the map that must be avoided, or the mouse falls, gets a negative reward, and has to start back at the beginning. The simulation looks something like exactly like this:

where the black is the edge of the map (walls), the red is the cliff area, the blue is the mouse and the green is the cheese. As mentioned and linked to above, the code for all of these examples can be found on my github (as a side note: when using the github code remember that you can press the page-up and page-down buttons to speed up and slow down the rate of simulation!)

Now, as we all remember, in the basic Q-learning control policy the action to take is chosen by having the highest action value. However, there is also a chance that some random action will be chosen; this is the built-in exploration mechanism of the agent. This means that even if we see this scenario:

There is a chance that that mouse is going to say ‘yes I see the best move, but…the hell with it’ and jump over the edge! All in the name of exploration. This becomes a problem, because if the mouse was following an optimal control strategy, it would simply run right along the edge of the cliff all the way over to the cheese and grab it. Q-learning assumes that the mouse is following the optimal control strategy, so the action values will converge such that the best path is along the cliff. Here’s an animation of the result of running the Q-learning code for a long time:

The solution that the mouse ends up with is running along the edge of the cliff and occasionally jumping off and plummeting to its death.

However, if the agent’s actual control strategy is taken into account when learning, something very different happens. Here is the result of the mouse learning to find its way to the cheese using SARSA:

Why, that’s much better! The mouse has learned that from time to time it does really foolish things, so the best path is not to run along the edge of the cliff straight to the cheese but to get far away from the cliff and then work its way over safely. As you can see, even if a random action is chosen there is little chance of it resulting in death.

Learning action values with SARSA

So now we know how SARSA determines it’s updates to the action values. It’s a very minor difference between the SARSA and Q-learning implementations, but it causes a profound effect.

Here is the Q-learning learn method:

def learn(self, state1, action1, reward, state2):
maxqnew = max([self.getQ(state2, a) for a in self.actions])
self.learnQ(state1, action1,
reward, reward + self.gamma*maxqnew)


And here is the SARSA learn method

def learn(self, state1, action1, reward, state2, action2):
qnext = self.getQ(state2, action2)
self.learnQ(state1, action1,
reward, reward + self.gamma * qnext)


As we can see, the SARSA method takes another parameter, action2, which is the action that was taken by the agent from the second state. This allows the agent to explicitly find the future reward value, qnext, that followed, rather than assuming that the optimal action will be taken and that the largest reward, maxqnew, resulted.

Written out, the Q-learning update policy is Q(s, a) = reward(s) + alpha * max(Q(s')), and the SARSA update policy is Q(s, a) = reward(s) + alpha * Q(s', a'). This is how SARSA is able to take into account the control policy of the agent during learning. It means that information needs to be stored longer before the action values can be updated, but also means that our mouse is going to jump off a cliff much less frequently, which we can probably all agree is a good thing.

## Cython Journey Part 3: Wrapping C code and passing arrays

Basic C interfacing example

OK, so this code is taken straight from an example here, I’m just going to explicitly work through all the steps so that when I try to do this again later I don’t have to look everything up again.

Let’s say we have some c code that we want to work with from Python, in a file called ‘square.c’:

void square(double* array, int n) {
int i;
for (i=0; i<n; ++i) {
array[i] = array[i] * array[i];
}
}


The first thing we’re going to need to do is compile this to a shared object file, so we punch the following into our terminal:

gcc -c square.c # compile the code to an object file
gcc square.o -shared -o square.so # compile to shared object file


And so now we have our square.so file. Super.

Here is the code that we can then use to interact with this shared object file from Python:

import numpy as np
import ctypes

n = 5
a = np.arange(n, dtype=np.double)

aptr = a.ctypes.data_as(ctypes.POINTER(ctypes.c_double))
square.square(aptr, n)

print a


OK, so there are a couple of lines with a lot of jargon but aside from that this looks pretty straightforward. The first noticeable thing is that we’re importing the ctypes library. So, make sure that you have that installed before you try to run this code. Right after importing ctypes then we create a handle into our shared object file, using the ctypes.cdll.LoadLibrary() command.

Then it’s some standard python code to create an array, and then there’s this funky command that lets us get a handle to the array that we just created as a c pointer that we can pass in to our square() function. You can see a list of all the different kinds of data types here.

Once we have that handle to our array, we simply pass it through to the square.square() function, and let it go and do whatever it’s going to do with it. Then we print out our array a, and to our great joy, all of the values have been squared.

So that’s that. Instead of returning anything like double* from the C library I would recommend just passing in the array as a parameter and modifying it instead. This is all nice and straightforward enough, and it seems like the returning business can lead to a lot of memory allocation and ownership headaches.

The code for this example can be found at my github, here: Square.

Tagged , , , ,

## Inverse kinematics of 3-link arm with constrained minimization in Python

Inverse kinematics is a common topic in robotics control; one that most anyone working with a robotic arm needs to address at some point. The use case I’ll be looking at here is the situation where we want to specify an $(x,y)$ coordinate for the end-effector, and solve for the appropriate joint angle configuration that best achieves this. It’s surprisingly straightforward to handle with the right approach, so it’s a nice little exercise in Python optimization (and we’ll even work in some graphical display!) And per usual, the code is all up on my github.

Most of the time the servos that anyone is working with are position controlled, meaning that you provide them with a set of desired angles for them to move to, rather than force controlled, where you provide them with a torque signal that directly moves them. In this case, inverse kinematics is all you need. You simply work out the set of joint angles you want the servos to move to and send those out directly. In biological systems this clearly isn’t the end of the work, because even if you know where you want to go you still have to provide a force (muscle activation) signal that will move your system there. But! as mentioned in the motor babbling paper review post I wrote a while back, having the target specified in the most basic space (i.e. joint angles instead of $(x,y)$ coordinates for a servo based arm, or muscle lengths instead of joint angles for a muscle based arm) can make calculating, or learning, the appropriate force signal much easier; so something that can do the inverse kinematics calculation is worth having.

Inverse kinematics

Now then, how do we go about finding the joint angle configuration that best achieves a desired $(x,y)$ coordinate? Constrained minimization! Our constraint is that the tip of the arm must be at the specified $(x,y)$ position, and we need to specify some cost function for the system to minimize to choose among all the possible configurations that accomplish this. The cost function that we’re going to use here is a very intuitive one, that I first encountered in the paper ‘Learning operational space control’ by Jan Peters. What we’re going to do is specify a default, or resting, joint state configuration, and minimize our distance from that. Which makes a lot of sense.

I’m going to post the code for calculating the inverse kinematics of a 3-link arm, and then we’ll work through it.

import math
import numpy as np
import scipy.optimize

def __init__(self, q=None, q0=None, L=None):
"""Set up the basic parameters of the arm.
All lists are in order [shoulder, elbow, wrist].

:param list q: the initial joint angles of the arm
:param list q0: the default (resting state) joint configuration
:param list L: the arm segment lengths
"""
# initial joint angles
if q is None: q = [.3, .3, 0]
self.q = q
# some default arm positions
if q0 is None: q0 = np.array([math.pi/4, math.pi/4, math.pi/4])
self.q0 = q0
# arm segment lengths
if L is None: L = np.array([1, 1, 1])
self.L = L

self.max_angles = [math.pi, math.pi, math.pi/4]
self.min_angles = [0, 0, -math.pi/4]

def get_xy(self, q=None):
if q is None: q = self.q

x = self.L[0]*np.cos(q[0]) + \
self.L[1]*np.cos(q[0]+q[1]) + \
self.L[2]*np.cos(np.sum(q))

y = self.L[0]*np.sin(q[0]) + \
self.L[1]*np.sin(q[0]+q[1]) + \
self.L[2]*np.sin(np.sum(q))

return [x, y]

def inv_kin(self, xy):

def distance_to_default(q, *args):
# weights found with trial and error, get some wrist bend, but not much
weight = [1, 1, 1.3]
return np.sqrt(np.sum([(qi - q0i)**2 * wi
for qi,q0i,wi in zip(q, self.q0, weight)]))

def x_constraint(q, xy):
x = ( self.L[0]*np.cos(q[0]) + self.L[1]*np.cos(q[0]+q[1]) +
self.L[2]*np.cos(np.sum(q)) ) - xy[0]
return x

def y_constraint(q, xy):
y = ( self.L[0]*np.sin(q[0]) + self.L[1]*np.sin(q[0]+q[1]) +
self.L[2]*np.sin(np.sum(q)) ) - xy[1]
return y

return scipy.optimize.fmin_slsqp( func=distance_to_default,
x0=self.q, eqcons=[x_constraint, y_constraint],
args=[xy], iprint=0) # iprint=0 suppresses output



I’ve taken out most of the comments for compactness, but there are plenty in the code on github, don’t you worry. Alright, let’s take a look!

First, we go through and set up the parameters of our arm. If nothing was passed in during construction, then we’re going to make an arm that is initially at it’s resting position, with shoulder and elbow joint angles = $\frac{\pi}{4}$, and wrist angle = 0. We also set up the arm segment lengths to all be 1, and the minimum and maximum joint angles.

Next, there’s the get_xy function, which just uses some simple trig to calculate the current $(x,y)$ position of the end-effector. This is going to be a handy thing to have when we want to see how well the joint configurations we calculate achieve our desired hand position.

And then after this, we have the actual inverse kinematics function. inv_kin takes in a desired $(x,y)$ position, and returns to us a list of the joint angles that will best achieve this. We define three methods inside inv_kin that define our constrained optimization problem, distance_to_default, which is the function we are minimizing, and x_constraint and y_constraint, which are the constraints that must be met for a solution to be valid. These are all pretty straight forward functions, the distance_to_default function calculates the Euclidean distance in joint space to the resting state, and the constraint functions calculate the difference between the actual and desired end-effector positions for a given set of joint angles.

There is one point of interest in the distance_to_default method, and that is the use of the weights therein. What is the point of applying a gain to these values, you ask? Why, simply that it lets us weight the relative importance of each of these joints remaining close to their default configuration! If you think about moving your arm on a horizontal plane, the wrist actually bends very little. To mimic this we can set the weight applied to the distance of the wrist from its resting state higher than those of the other joints, so that it becomes a higher priority to keep the wrist near its resting state. I suggest playing around with once the graphical display (below) is built up, it’s fun to see the effects different weightings can give on joint configurations.

With these three functions defined, we can now call up the scipy.optimize.fmin_slsqp function, which performs the sequential least squares method to arrive at a solution.

Really, that’s all there is to it!

To test this now, we have the following method:

def test():
############Test it!##################

# set of desired (x,y) hand positions
x = np.arange(-.75, .75, .05)
y = np.arange(0, .75, .05)

# threshold for printing out information, to find trouble spots
thresh = .025

count = 0
total_error = 0
# test it across the range of specified x and y values
for xi in range(len(x)):
for yi in range(len(y)):
# test the inv_kin function on a range of different targets
xy = [x[xi], y[yi]]
# run the inv_kin function, get the optimal joint angles
q = arm.inv_kin(xy=xy)
# find the (x,y) position of the hand given these angles
actual_xy = arm.get_xy(q)
# calculate the root squared error
error = np.sqrt((np.array(xy) - np.array(actual_xy))**2)
# total the error
total_error += error

if np.sum(error) > thresh:
print '-------------------------'
print 'Initial joint angles', arm.q
print 'Final joint angles: ', q
print 'Desired hand position: ', xy
print 'Actual hand position: ', actual_xy
print 'Error: ', error
print '-------------------------'

count += 1

print '\n---------Results---------'
print 'Total number of trials: ', count
print 'Total error: ', total_error
print '-------------------------'


Which is simply working through a set of target $(x,y)$ points, and calculating the total error across all of them. If any particular point gives an error above the threshold level, print out the information for that point so we can take a closer look if so desired. Fortunately, the threshold error isn’t even approached in this test, and for output we get:

---------Results---------
Total number of trials:  450
Total error:  [  3.33831421e-05   2.89667496e-05]
-------------------------


Visualization

So that’s all great and dandy, but it’s always nice to be able to really visualize these things, so I wrote another method that uses Pyglet to help visualize. This is a really easy to use graphics program that I tested out as a means here, and I was very happy with it. I ended up writing a method that pops up a window with a 3-link arm drawn on it, and the arm uses the above inverse kinematics function in the Arm class written above to calculate the appropriate joint angles for the arm to be at to follow the mouse. Once the joint angles are calculated, then their $(x,y)$ locations are also calculated, and the arm is drawn. Pleasingly, the inv_kin above is fast enough to work in real time, so it’s a kind of fun toy example. Again, I’ll show the code and then we’ll work through it below.

import numpy as np
import pyglet
import time

import Arm

def plot():
"""A function for plotting an arm, and having it calculate the
inverse kinematics such that given the mouse (x, y) position it
finds the appropriate joint angles to reach that point."""

# create an instance of the arm

# make our window for drawin'
window = pyglet.window.Window()

label = pyglet.text.Label('Mouse (x,y)', font_name='Times New Roman',
font_size=36, x=window.width//2, y=window.height//2,
anchor_x='center', anchor_y='center')

def get_joint_positions():
"""This method finds the (x,y) coordinates of each joint"""

x = np.array([ 0,
arm.L[0]*np.cos(arm.q[0]),
arm.L[0]*np.cos(arm.q[0]) + arm.L[1]*np.cos(arm.q[0]+arm.q[1]),
arm.L[0]*np.cos(arm.q[0]) + arm.L[1]*np.cos(arm.q[0]+arm.q[1]) +
arm.L[2]*np.cos(np.sum(arm.q)) ]) + window.width/2

y = np.array([ 0,
arm.L[0]*np.sin(arm.q[0]),
arm.L[0]*np.sin(arm.q[0]) + arm.L[1]*np.sin(arm.q[0]+arm.q[1]),
arm.L[0]*np.sin(arm.q[0]) + arm.L[1]*np.sin(arm.q[0]+arm.q[1]) +
arm.L[2]*np.sin(np.sum(arm.q)) ])

return np.array([x, y]).astype('int')

window.jps = get_joint_positions()

@window.event
def on_draw():
window.clear()
label.draw()
for i in range(3):
pyglet.graphics.draw(2, pyglet.gl.GL_LINES, ('v2i',
(window.jps[0][i], window.jps[1][i],
window.jps[0][i+1], window.jps[1][i+1])))

@window.event
def on_mouse_motion(x, y, dx, dy):
# call the inverse kinematics function of the arm
# to find the joint angles optimal for pointing at
# this position of the mouse
label.text = '(x,y) = (%.3f, %.3f)'%(x,y)
arm.q = arm.inv_kin([x - window.width/2, y]) # get new arm angles
window.jps = get_joint_positions() # get new joint (x,y) positions

pyglet.app.run()

plot()


There are really only a few things that we’re doing in this method: We create an arm with somewhat normal segment lengths and a window for plotting; we have a function for calculating the $(x,y)$ coordinates of each of the joints for plotting; we have a function that tracks the position of the mouse and updates our arm joint angles by calling the arm.inv_kin function we wrote above; and lastly we plot it! Easy. The functions aren’t quite in that order, and that’s because I chose alphabetical ordering over conceptual. Here’s a picture!

I think the only kind of weird thing going on in here is that my variables that I use in both this method and in the overwritten window methods are defined as belonging to window. Aside from that, it’s straightforward trig to calculate the joint angle Cartesian coordinates, and then there’s also a label that displays the mouse $(x,y)$ position for the hell of it. Honestly, the label was in a Pyglet tutorial example and I just left it in.

It’s not going to win any awards for graphics by any means, but it’s simple and it works! And, assuming you have Pyglet installed (which is very easy to do), you should straight up be able to run it and be amazed. Or a little impressed. Or not, whatever. But in any case now you have your very own inverse kinematics example on a 3-link arm! Moving to any other number of joints is exactly the same, you just have to add the corresponding number of default joint configurations and everything should work out as is.

From here, the plan is to look at bringing in an actual force driven arm model to the mix, and then building up a system that can track the mouse, but with a control signal that specifies forces to apply to the arm, rather than positions. Which I think will be pretty neat. And in case you missed it at the top, here’s a link to all the code on my github.