ABR Control repo public release!

https://github.com/abr/abr_control

Last August I started working on a new version of my old control repo with a resolve to make something less hacky, as part of the work for Applied Brain Research, Inc, a startup that I co-founded with others from my lab after most of my cohort graduated. Together with Pawel Jaworski, who comprises other half of ABR’s motor team, we’ve been building up a library of useful functions for modeling, simulating, interfacing, and controlling robot arms.

Today we’re making the repository public, under the same free for non-commercial use that we’ve released our Nengo neural modelling software on. You can find it here: ABR_Control

It’s all Python 3, and here’s an overview of some of the features:

  • Automates generation of functions for computing the Jacobians, joint space and task space inertia matrices, centrifugal and Coriolis effects, and Jacobian derivative, provided each link’s inertia matrix and the transformation matrices
  • Option to compile these functions to speedy Cython implementations
  • Operational space, joint space, floating, and sliding controllers provided with PyGame and VREP example scripts
  • Obstacle avoidance and dynamics adaptation add-ons with PyGame example scripts
  • Interfaces with VREP
  • Configuration files for one, two, and three link arms, as well as the UR5 and Jaco2 arms in VREP
  • Provides Python simulations of two and three link arms, with PyGame visualization
  • Path planning using first and second order filtering of the target and example scripts.

Structure

The ABR Control library is divided into three sections:

  1. Arm descriptions (and simulations)
  2. Robotic system interfaces
  3. Controllers

The big goal was to make all of these interchangeable, so that to run any permutation of them you just have to change which arm / interface / controller you’re importing.

To support a new arm, the user only needs to create a configuration file specifying the transforms and inertia matrices. Code for calculating the necessary functions of the arm will be symbolically derived using SymPy, and compiled to C using Cython for efficient run-time execution.

Interfaces provide send_forces and send_target_angles functions, to apply torques and put the arm in a target state, as well as a get_feedback function, which returns a dictionary of information about the current state of the arm (joint angles and velocities at a minimum).

Controllers provide a generate function, which take in current system state information and a target, and return a set of joint torques to apply to the robot arm.

VREP example

The easiest way to show it is with some code examples. So, once you’ve cloned and installed the repo, you can open up VREP and the jaco2.ttt model in the abr_control/arms/jaco2 folder, and to control it using an operational space controller you would run the following:

import numpy as np
from abr_control.arms import jaco2 as arm
from abr_control.controllers import OSC
from abr_control.interfaces import VREP

# initialize our robot config for the ur5
robot_config = arm.Config(use_cython=True, hand_attached=True)

# instantiate controller
ctrlr = OSC(robot_config, kp=200, vmax=0.5)

# create our VREP interface
interface = VREP(robot_config, dt=.001)
interface.connect()

target_xyz = np.array([0.2, 0.2, 0.2])
# set the target object's position in VREP
interface.set_xyz(name='target', xyz=target_xyz)

count = 0.0
while count < 1500:  # run for 1.5 simulated seconds
    # get joint angle and velocity feedback
    feedback = interface.get_feedback()
    # calculate the control signal
    u = ctrlr.generate(
        q=feedback['q'],
        dq=feedback['dq'],
        target_pos=target_xyz)
    # send forces into VREP, step the sim forward
    interface.send_forces(u)

    count += 1
interface.disconnect()

This is a minimal example of the examples/VREP/reaching.py code. To run it with a different arm, you can just change the from abr_control.arms import as line. The repo comes with the configuration files for the UR5 and a onelink VREP arm model as well.

PyGame example

I’ve also found the PyGame simulations of the 2 and 3 link arms very helpful for quickly testing new controllers and code, as an easy low overhead proof of concept sandbox. To run the threelink arm (which runs in Linux and Windows fine but I’ve heard has issues in Mac OS), with the operational space controller, you can run this script:

import numpy as np
from abr_control.arms import threelink as arm
from abr_control.interfaces import PyGame
from abr_control.controllers import OSC

# initialize our robot config
robot_config = arm.Config(use_cython=True)
# create our arm simulation
arm_sim = arm.ArmSim(robot_config)

# create an operational space controller
ctrlr = OSC(robot_config, kp=300, vmax=100,
            use_dJ=False, use_C=True)

def on_click(self, mouse_x, mouse_y):
    self.target[0] = self.mouse_x
    self.target[1] = self.mouse_y

# create our interface
interface = PyGame(robot_config, arm_sim, dt=.001, 
                   on_click=on_click)
interface.connect()

# create a target
feedback = interface.get_feedback()
target_xyz = robot_config.Tx('EE', feedback['q'])
interface.set_target(target_xyz)

try:
    while 1:
        # get arm feedback
        feedback = interface.get_feedback()
        hand_xyz = robot_config.Tx('EE', feedback['q'])

        # generate an operational space control signal
        u = ctrlr.generate(
            q=feedback['q'],
            dq=feedback['dq'],
            target_pos=target_xyz)

        new_target = interface.get_mousexy()
        if new_target is not None:
            target_xyz[0:2] = new_target
        interface.set_target(target_xyz)

        # apply the control signal, step the sim forward
        interface.send_forces(u)

finally:
    # stop and reset the simulation
    interface.disconnect()

The extra bits of code just set up a hook so that when you click on the PyGame display somewhere the target moves to that point.

So! Hopefully some people find this useful for their research! It should be as easy to set up as cloning the repo, installing the requirements and running the setup file, and then trying out the examples.

If you find a bug please file an issue! If you find a way to improve it please do so and make a PR! And if you’d like to use anything in the repo commercially, please contact me.

Advertisement
Tagged , , , , ,

Deriving a robot’s transform matrices

While doing some soul searching recently, I realised that in previous posts I’ve glossed over actually deriving transform matrices, and haven’t discussed a methodical way of going about it. If you’ve ever tried working out transforms you know they can be a pain if you don’t know have a set process to follow. Although I’ve given a bunch of examples in previous posts, this post is intended to clear up any confusion that crops up when you’re trying to work out the transforms for your own robot. This is far from an absolute guide covering all the cases, but hopefully it gives a firm enough footing that you can handle the rest.

Note: I’m only going to be dealing with revolute joints in this post. Linear joints are easy (just put the joint offset in the translation part of the transform matrix) and spherical joints are horrible death beyond the scope of this post.

First thing’s first, quick recap of what transform matrices actually are. A transform matrix changes the coordinate system (reference frame) a point is defined in. We’re using the notation \textbf{T}_0^1 to denote a transformation matrix that transforms a point from reference frame 1 to reference frame 0. To calculate this matrix, we described the transformation from reference frame 0 to reference frame 1.

To make things easier, we’re going to break up each transform matrix into two parts. Unfortunately I haven’t found a great way to denote these two matrices, so we’re going to have to use additional subscripts:

  1. \textbf{T}^{i+1}_{ia}: accounting for the joint rotation, and
  2. \textbf{T}^{i+1}_{ib}: accounting for static translations and rotations.

So the \textbf{T}_a matrix accounts for transformations that involve joint angles, and the \textbf{T}_b matrix accounts for all the static transformations between reference frames.

Step 1: Account for the joint rotation

When calculating the transform matrix from a joint to a link’s centre-of-mass (COM), note that the reference frame of the link’s COM rotates along with the angle of the joint. Conversely, the joint’s reference frame does not rotate with the joint angle. Look at this picture!

01

In these pictures, we’re looking at the transformation from joint i’s reference frame to the COM for link i. q_i denotes the joint angle of joint i. In the first image (on the left) the joint angle is almost 90 degrees, and on the right it’s closer to 45 degrees. Keeping in mind that the reference frame for COM i changes with the joint angle, and the reference frame for joint i does not, helps make deriving the transform matrices much easier.

So, to actually account for joint rotation, all you have to do is determine which axis the joint is rotating around and create the appropriate rotation matrix inside the transform.

For rotations around the x axis:

\left[\begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & \textrm{cos}(q_i) & -\textrm{sin}(q_i) & 0 \\ 0 & \textrm{sin}(q_i) & \textrm{cos}(q_i) & 0 \\ 0 & 0 & 0 & 1\end{array}\right]

For rotations around the y axis:

\left[\begin{array}{cccc} \textrm{cos}(q_i) & 0 & -\textrm{sin}(q_i) & 0 \\ 0 & 1 & 0 & 0 \\ \textrm{sin}(q_i) & 0 & \textrm{cos}(q_i) & 0 \\ 0 & 0 & 0 & 1\end{array}\right]

For rotations around the z axis:

\left[\begin{array}{cccc} \textrm{cos}(q_i) & -\textrm{sin}(q_i) & 0 & 0 \\ \textrm{sin}(q_i) & \textrm{cos}(q_i) & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1\end{array}\right]

And there you go! Easy. The \textbf{T}_a should be one of the above matrices unless something fairly weird is going on. For the arm in the diagram above, the joint is rotating around the z axis, so

\textbf{T}_a = \left[\begin{array}{cccc} \textrm{cos}(q_i) & -\textrm{sin}(q_i) & 0 & 0 \\ \textrm{sin}(q_i) & \textrm{cos}(q_i) & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1\end{array}\right]

Step 2: Account for translations and any static axes rotations

Once the joint rotation is accounted for, translations become much simpler to deal with: Just put joint i at 0 degrees and the offset from the origin of reference frame i to i+1 is your translation.

So for the above example, this is what the system looks like when joint i is at 0 degrees:

02

where I’ve also added labels to the x and y axes for clarity. Say that the COM is at (1, 0), then that is what you set the x, y translation values to in the \textbf{T}_b.

Also we need to note that there is a rotation in the axes between reference frames (i.e. x and y do not point in the same direction for both axes sets). Here the rotation is 90 degrees. So the rotation part of the transformation matrix should be a 90 degree rotation.

Thus, the transformation matrix for our static translations and rotations is

\textbf{T}_b = \left[\begin{array}{cccc} 0 & -1 & 0 & 1 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1\end{array}\right]

Step 3: Putting them together

This is pretty easy, to compute the full transform from joint i to the COM i you just string together the two matrices derived above:

\textbf{T}^{COM_i}_{joint_i} = \textbf{T}_a \textbf{T}_b

where I’m using simplified notation on the left-hand side because we don’t need the full notation for clarity in this simple example.

So when you’re transforming a point, you left multiply by \textbf{T}:

\textbf{x}_{joint_i} = \textbf{T}^{COM_i}_{joint_i} \textbf{x}_{COM_i} = \textbf{T}_a \textbf{T}_b \textbf{x}_{COM_i},

which is not great notation, but hopefully conveys the idea. To get our point \textbf{x} into the reference frame of joint i, we left multiply the point as defined in the reference frame of COM i by \textbf{T}^{COM_i}_{joint_i}.

So there you go! That’s the process for a single joint to COM transform. For transforms from COM to joints it’s even easier because you only need to account for the static offsets and axes rotations.

Examples

It’s always nice to have examples, so let’s work through the deriving the transform matrices for the Jaco2 arm as it’s set up in the VREP. The process is pretty straight-forward, with really the only tricky bit converting from the Euler angles that VREP gives you to a rotation matrix that we can use in our transform.

But first thing’s first, load up VREP and drop in a Jaco2 model.

NOTE: I’ve renamed the joints and links to start at 0, to make indexing in / interfacing with VREP easier.

We’re going to generate the transforms for the whole arm piece by piece, sequentially: origin -> link 0 COM, link 0 COM -> joint 0, joint 0 -> link 1 COM, etc. I’m going to use the notation l_i and j_i to denote link i COM and joint i, respectively, in the transform sub and superscripts.

The first transform we want then is for origin -> link 0 COM. There’s no joint rotation that we need to account for in this transform, so we only have to look for static translation and rotation. First click on link 0, and then

  • ‘Object/item shift’ button from the toolbar,
  • the ‘Position’ tab,
  • select the ‘Parent frame’ radio button,

and it will provide you with the translation of link 0’s COM relative to its parent frame, as shown below:


So the translation part of our transformation is [0, 0, .0784]^T.

The rotation part we can see by

  • ‘Object/item rotate’ button from the toolbar,
  • the ‘Orientation’ tab,
  • select the ‘Parent frame’ radio button,


Here, it’s not quite as straight forward to use this info to fill in our transform matrix. So, if we pull up the VREP description for their Euler angles we see that to generate the rotation matrix from them you perform:

\textbf{R} = \textbf{R}_x(\alpha) \textbf{R}_y(\beta) \textbf{R}_z(\gamma)

where the \textbf{R}_i matrix is a rotation matrix around the i axis (as described above). For (\alpha=0, \beta=0, \gamma=0) this works out to no rotation, as you may have guessed. So then our first transform is

\textbf{T}^{l0}_{orgin} = \left[ \begin{array}{cccc}1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & .0784 \\ 0 & 0 & 0 & 1\end{array} \right]

For the second transform matrix, from link 0 to joint 0, again we only have to account for static translations and rotations. Pulling up the translation information:

vrep03
we have the translation [0, 0, 0.0784]^T.

And this time, for our rotation matrix, there’s a flip in the axes so we have something more interesting than no rotation:

vrep04
we have Euler angles (-180, 0, 0). This is also a good time to note that the angles are provided in degrees, so let’s convert those over, giving us approximately (\pi, 0, 0). Now, calculating our rotation matrix:

\textbf{R} = \textbf{R}_x(\alpha) \textbf{R}_y(\beta) \textbf{R}_z(\gamma)

\textbf{R}^{j0}_{l0} = \left[ \begin{array}{ccc}1 & 0 & 0 \\ 0 & \textrm{cos}(\pi) & -\textrm{sin}(\pi) \\ 0 & \textrm{sin}(\pi) & \textrm{cos}(\pi) \end{array} \right] \left[ \begin{array}{ccc}\textrm{cos}(0) & 0 & -\textrm{sin}(0) \\ 0 & 1 & 0 \\ \textrm{sin}(0) & 0 & \textrm{cos}(0) \end{array} \right] \left[ \begin{array}{ccc} \textrm{cos}(0) & -\textrm{sin}(0) & 0 \\ \textrm{sin}(0) & \textrm{cos}(0) & 0 \\ 0 & 0 & 1 \end{array} \right]

\textbf{R}^{j0}_{l0} = \left[ \begin{array}{ccc}1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & -1 \end{array} \right]

which then gives us the transform matrix

\textbf{T}^{j0}_{l0} = \left[ \begin{array}{cccc}1 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0\\ 0 & 0 & -1 & 0.0784 \\ 0 & 0 & 0 & 1\end{array} \right].

And the last thing I’ll show here is accounting for the joint rotation. In the transform from joint 0 to link 1 we have to account for the joint rotation, so we’ll break it down into two matrices as described above. To generate the first one, all we need to know is which axis the joint rotates around. In VREP, this is almost always the z axis, but it’s good to double check in case someone has built a weird model. To check, one easy way is to make the joint visible, and the surrounding arm parts invisible:

vrep05
You can do this by

  • double clicking on the joint in the scene hierarchy to get to the Scene Object Properties window,
  • selecting the ‘Common’ tab,
  • selecting or deselecting check boxes from the ‘Visibility’ box.

As you can see in the image, this joint is indeed rotating around the z axis, so the first part of our transformation from joint 0 to link 1 is

\textbf{T}^{l1}_{j0a} = \left[ \begin{array}{cccc}\textrm{cos}(q_0) & -\textrm{sin}(q_0) & 0 & 0 \\ \textrm{sin}(q_0) & \textrm{cos}(q_0) & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1\end{array} \right]

The second matrix can be generated as previously described, so I’ll leave that, and the rest of the arm, to you!

If you’d like to see a full arm example, you can check out the tranforms for the UR5 model in VREP up on my GitHub.

Tagged ,

Improving neural models by compensating for discrete rather than continuous filter dynamics when simulating on digital systems

This is going to be a pretty niche post, but there is some great work by Aaron Voelker from my old lab that has inspired me to do a post. The work is from an upcoming paper, which is all up on Aaron’s GitHub. It applies to building neural models using the Neural Engineering Framework (NEF). There’s a bunch of material on the NEF out there already, (e.g. the book How to Build a Brain by Dr. Chris Eliasmith, an online intro, and you can also check out Nengo, which is neural model development software with some good tutorials on the NEF) so I’m going to assume you already know the basics of the NEF for this post.

Additionally, this is applicable to simulating these models on digital systems, which, probably, most of you are. If you’re not, however! Then use standard NEF methods.

And then last note before starting, these methods are most relevant for systems with fast dynamics (relative to simulation time). If your system dynamics are pretty slow, you can likely get away with the continuous time solution if you resist change and learning. And we’ll see this in the example point attractor system at the end of the post! But even for slowly evolving systems, I would still recommend at least skipping to the end and seeing how to use the library shortcuts when coding your own models. The example code is also all up on my GitHub.

NEF modeling with continuous lowpass filter dynamics

Basic state space equations for linear time-invariant (LTI) systems (i.e. dynamics can be captured with a matrix and the matrices don’t change over time) are:

\dot{\textbf{x}}(t) = \textbf{A}\textbf{x}(t) + \textbf{B}\textbf{u}(t)

\textbf{y}(t) = \textbf{C}\textbf{x}(t) + \textbf{D}\textbf{u}(t)

where

  • \textbf{x} is the system state,
  • \textbf{y} is the system output,
  • \textbf{u} is the system input,
  • \textbf{A} is called the state matrix,
  • \textbf{B} is called the input matrix,
  • \textbf{C} is called the output matrix, and
  • \textbf{D} is called the feedthrough matrix,

and the system diagram looks like this:

Typical_State_Space_model

and the transfer function, which is written in Laplace space and captures the system output over system input, for the system is

\textbf{F}(s) = \frac{\textbf{Y}(s)}{\textbf{U}(s)} = \textbf{C}(s\textbf{I} - \textbf{A})^{-1} \textbf{B} + \textbf{D}

where s is the Laplace variable.

Now, because it’s a neural system we don’t have a perfect integrator in the middle, we instead have a synaptic filter, H(s), giving:

Neural_State_Space_model

So our goal is: given some synaptic filter H(s), we want to generate some modified transfer function, \textbf{F}', such that \textbf{F}'(H(s)) has the same dynamics as our desired system, \textbf{F}(s). In other words, find an \textbf{F}' such that

\textbf{F}'\left(\frac{1}{H(s)}\right) = \textbf{F}(s).

Alrighty. Let’s do that.

The transfer function for our neural system is

\textbf{F}(H(s)) = \frac{\textbf{Y}(s)}{\textbf{U}(s)} = \textbf{C}(H(s)^{-1}\textbf{I} - \textbf{A})^{-1} \textbf{B} + \textbf{D}.

The effect of the synapse is well captured by a lowpass filter, H(s) = \frac{1}{\tau s + 1}, making our equation

\textbf{F}(H(s)) = \frac{\textbf{Y}(s)}{\textbf{U}(s)} = \textbf{C}((\tau s + 1)\textbf{I} - \textbf{A})^{-1} \textbf{B} + \textbf{D},

\textbf{F}(H(s)) = \frac{\textbf{Y}(s)}{\textbf{U}(s)} = \textbf{C}(\tau s \textbf{I} + \textbf{I} - \textbf{A})^{-1} \textbf{B} + \textbf{D}.

To get this into a form where we can start to modify the system state matrices to compensate for the filter effects, we have to isolate s\textbf{I}. To do that, we can do the following basic math jujitsu

\textbf{F}(H(s)) = \frac{\textbf{Y}(s)}{\textbf{U}(s)} = \textbf{C}(\tau (s \textbf{I} + \frac{1}{\tau}(\textbf{I} - \textbf{A}))^{-1} \textbf{B} + \textbf{D}.

\textbf{F}(H(s)) = \frac{\textbf{Y}(s)}{\textbf{U}(s)} = \textbf{C}(s \textbf{I} + \frac{1}{\tau}(\textbf{I} - \textbf{A})^{-1} \frac{1}{\tau}\textbf{B} + \textbf{D}.

OK. Now, we can make \textbf{F}' by substituting for our \textbf{A} and \textbf{B} matrices with

\textbf{A}' = \tau\textbf{A} + \textbf{I}

\textbf{B}' = \tau\textbf{B}

then we get

\textbf{F}'(H(s)) = \frac{\textbf{Y}(s)}{\textbf{U}(s)} = \textbf{C}(s \textbf{I} + \frac{1}{\tau}(\textbf{I} - \textbf{A}')^{-1} \frac{1}{\tau}\textbf{B}' + \textbf{D}.

= \textbf{C}(s \textbf{I} + \frac{1}{\tau}(\textbf{I} - (\tau\textbf{A} + \textbf{I}))^{-1} \frac{1}{\tau}(\tau\textbf{B}) + \textbf{D}.

= \textbf{C}(s \textbf{I} + \frac{1}{\tau}(\tau\textbf{A}))^{-1}\textbf{B} + \textbf{D}.

= \textbf{C}(s \textbf{I} + \textbf{A})^{-1}\textbf{B} + \textbf{D}.

and voila! We have created an \textbf{F}' such that \textbf{F}'(H(s)) = \textbf{F}(s). Said another way, we have created a system (\textbf{A}', \textbf{B}', \textbf{C}'=\textbf{C}, \textbf{D}'=\textbf{D}) that compensates for the synaptic filtering effects to achieve our desired system dynamics!

So, to compensate for the continuous lowpass filter, we use \textbf{A}' = \tau \textbf{A} + \textbf{I} and \textbf{B}' = \tau \textbf{B} when implementing our model and we’re golden.

And so that’s what we’ve been doing for a long time when building our models. Assuming a continuous lowpass filter and going along our merry way. Aaron, however, shrewdly noticed that computers are digital, and thusly that the standard NEF methods are not a fully accurate way of compensating for the filter that is actually being applied in simulation.

To convert our continuous system state equations to discrete state equations we need to make two changes: 1) the first is a variable change to denote the that we’re in discrete time, and we’ll use z instead of s, and 2) we need to calculate the discrete version our system, i.e. transform (\textbf{A}, \textbf{B}, \textbf{C}, \textbf{D}) \rightarrow (\textbf{A}_d, \textbf{B}_d, \textbf{C}_d, \textbf{D}_d).

The first step is easy, the second step more complicated. To discretize the system we’ll use the zero-order hold (ZOH) method (also referred to as discretization assuming zero-order hold).

Zero-order hold discretization

Zero-order hold (ZOH) systems simply hold their input over a specified amount of time. The use of ZOH here is that during discretization we assume the input control signal stays constant until the next sample time.

There are good write ups on the derivation of the discretization both on wikipedia and in these course notes from Purdue. I mostly followed the wikipedia derivation, but there were a few steps that get glossed over, so I thought I’d just write it out fully here and hopefully save someone some pain. Also for just a general intro I found these slides from Paul Oh at Drexel University really helpful.

OK. First we’ll solve an LTI system, and then we’ll discretize it.

So, you’ve got yourself a continuous LTI system

\dot{\textbf{x}}(t) = \textbf{A}\textbf{x}(t) + \textbf{B}\textbf{u}(t)

and you want to solve for \textbf{x}(t). Rearranging things to put all the \textbf{x} on one side gives

\dot{\textbf{x}}(t) - \textbf{A}\textbf{x}(t) = \textbf{B}\textbf{u}(t).

Looking through our identity library to find something that might help us here (after a long and grueling search) we come across:

\frac{\partial}{\partial t} \textrm{e}^{\textbf{A}t} = \textbf{A} \textrm{e}^{\textbf{A}t} = \textrm{e}^{\textbf{A}t} \textbf{A}.

We now left multiply our system by \textrm{e}^{-\textbf{A}t} (note the negative in the exponent)

\textrm{e}^{-\textbf{A}t}\dot{\textbf{x}}(t) - \textrm{e}^{-\textbf{A}t}\textbf{A}\textbf{x}(t) = \textrm{e}^{-\textbf{A}t}\textbf{B}\textbf{u}(t).

Looking at this carefully, we identify the left-hand side as the result of a chain rule, so we can rewrite it as

\frac{\partial}{\partial t} (\textrm{e}^{-\textbf{A}t}\textbf{x}(t)) = \textrm{e}^{-\textbf{A}t}\textbf{B}\textbf{u}(t).

From here we integrate both sides, giving

\textrm{e}^{-\textbf{A}t}\textbf{x}(t) - \textrm{e}^0\textbf{x}(0) = \int_0^t\textrm{e}^{-\textbf{A}\tau}\textbf{B}\textbf{u}(\tau) d \tau,

\textrm{e}^{-\textbf{A}t}\textbf{x}(t) = \int_0^t\textrm{e}^{-\textbf{A}\tau}\textbf{B}\textbf{u}(\tau) d \tau + \textbf{x}(0).

To isolate the \textbf{x}(t) term on the left-hand side now multiply by \textrm{e}^{\textbf{A}t}:

\textrm{e}^{\textbf{A}t}\textrm{e}^{-\textbf{A}t}\textbf{x}(t) = \textrm{e}^{\textbf{A}t}\int_0^t\textrm{e}^{-\textbf{A}\tau}\textbf{B}\textbf{u}(\tau) d \tau + \textrm{e}^{\textbf{A}t}\textbf{x}(0),

\textrm{e}^{\textbf{A}t-\textbf{A}t}\textbf{x}(t) = \textrm{e}^{\textbf{A}t}\int_0^t\textrm{e}^{-\textbf{A}\tau}\textbf{B}\textbf{u}(\tau) d \tau + \textrm{e}^{\textbf{A}t}\textbf{x}(0),

\textbf{x}(t) = \textrm{e}^{\textbf{A}t}\int_0^t\textrm{e}^{-\textbf{A}\tau}\textbf{B}\textbf{u}(\tau) d \tau + \textrm{e}^{\textbf{A}t}\textbf{x}(0).

OK! We solved for \textbf{x}(t).

To discretize our solution we’re going to assume that we’re sampling the system at even intervals, i.e. each sample is at kT for some time step T, and that the input \textbf{u}(t) is constant between samples (this is where the ZOH comes in). To simplify our notation as we go, we also define

\textbf{x}[k] = \textbf{x}(kT).

So using our new notation, we have

\textbf{x}[k] = \textrm{e}^{\textbf{A}kT}\textbf{x}(0) + \textrm{e}^{\textbf{A}kT}\int_0^{kT} \textrm{e}^{-\textbf{A}\tau}\textbf{Bu}(\tau) d\tau.

Now we want to get things back into the form:

\textbf{x}[k+1] = \textbf{A}_d\textbf{x}[k] + \textbf{B}_d\textbf{u}[k].

To start, let’s write out the equation for \textbf{x}[k + 1]

\textbf{x}[k+1] = \textrm{e}^{\textbf{A}(k+1)T}\textbf{x}(0) + \textrm{e}^{\textbf{A}(k+1)T}\int_0^{(k+1)T} \textrm{e}^{-\textbf{A}\tau}\textbf{Bu}(\tau) d\tau.

We want to relate \textbf{x}[k+1] to \textbf{x}[k]. Being incredibly clever, we see that we can left multiply \textbf{x}[k] by \textrm{e}^{\textbf{A}T}, to get

\textrm{e}^{\textbf{A}T}\textbf{x}[k] = \textrm{e}^{\textbf{A}(k+1)T}\textbf{x}(0) + \textrm{e}^{\textbf{A}(k+1)T}\int_0^{kT} \textrm{e}^{-\textbf{A}\tau}\textbf{Bu}(\tau) d\tau,

and can rearrange to solve for a term we saw in \textbf{x}[k+1]:

\textrm{e}^{\textbf{A}(k+1)T}\textbf{x}(0) = \textrm{e}^{\textbf{A}T}\textbf{x}[k] - \textrm{e}^{\textbf{A}(k+1)T}\int_0^{kT} \textrm{e}^{-\textbf{A}\tau}\textbf{Bu}(\tau) d\tau.

Plugging this in, we get

\textbf{x}[k+1] = \textrm{e}^{\textbf{A}T}\textbf{x}[k] - \textrm{e}^{\textbf{A}(k+1)T}(\int_0^{kT} \textrm{e}^{-\textbf{A}\tau}\textbf{Bu}(\tau) d\tau + \int_0^{(k+1)T} \textrm{e}^{-\textbf{A}\tau}\textbf{Bu}(\tau) d\tau),

\textbf{x}[k+1] = \textrm{e}^{\textbf{A}T}\textbf{x}[k] - \textrm{e}^{\textbf{A}(k+1)T}\int_{kT}^{(k+1)T} \textrm{e}^{-\textbf{A}\tau}\textbf{Bu}(\tau) d\tau.

OK, we’re getting close.

At this point we’ve got things in the right form, but we can still clean up that second term on the right-hand side quite a bit. First, note that using our starting assumption (that \textbf{u}(t) \in [k, kT) is constant), we can take \textbf{Bu}(t) outside the integral.

\textbf{x}[k+1] = \textrm{e}^{\textbf{A}T}\textbf{x}[k] - \textrm{e}^{\textbf{A}(k+1)T}\int_{kT}^{(k+1)T} \textrm{e}^{-\textbf{A}\tau}d\tau \textbf{Bu}[k].

Next, bring that \textrm{e}^{\textbf{A}(k+1)T} term inside the integral:

\textbf{x}[k+1] = \textrm{e}^{\textbf{A}T}\textbf{x}[k] - \int_{kT}^{(k+1)T} \textrm{e}^{\textbf{A}((k+1)T - \tau)}d\tau \textbf{Bu}[k].

And now we’re going to simplify the integral using variable substitution. Let v = (k+1)T - \tau, which means also that \frac{dv}{d\tau} = -1 \rightarrow d\tau = -dv. Evaluating the upper and lower bounds of the integral, when \tau = (k+1)T then v = 0 and when \tau = kT then v = T. With this, we can rewrite our equation:

\textbf{x}[k+1] = \textrm{e}^{\textbf{A}T}\textbf{x}[k] - \int_T^0 \textrm{e}^{\textbf{A}v}dv \textbf{Bu}[k].

The astute will notice our integral integrates from T to 0 instead of 0 to T. Fortunately for us, we know \int_a^b x = -\int_b^a. We can just swap the bounds and multiply by -1, giving:

\textbf{x}[k+1] = \textrm{e}^{\textbf{A}T}\textbf{x}[k] + \int_0^T \textrm{e}^{\textbf{A}v}dv \textbf{Bu}[k].

And finally, we can evaluate our integral by recalling that \frac{d}{dt}\textrm{e}^{\textbf{A}t} = \textbf{A}\textrm{e}^{\textbf{A}t} and assuming that \textbf{A} is invertible:

\textbf{x}[k+1] = \textrm{e}^{\textbf{A}T}\textbf{x}[k] + \textbf{A}^{-1} \textrm{e}^{\textbf{A}v}|^T_{v=0} \textbf{Bu}[k].

\textbf{x}[k+1] = \textrm{e}^{\textbf{A}T}\textbf{x}[k] + \textbf{A}^{-1} (\textrm{e}^{\textbf{A}T} - \textrm{e}^0) \textbf{Bu}[k].

\textbf{x}[k+1] = \textrm{e}^{\textbf{A}T}\textbf{x}[k] + \textbf{A}^{-1} (\textrm{e}^{\textbf{A}T} - \textbf{I}) \textbf{Bu}[k].

We did it! The state and input matrices for our digital system are:

\textbf{A}_d = \textrm{e}^{\textbf{A}T}

\textbf{B}_d = \textbf{A}^{-1} (\textrm{e}^{\textbf{A}T} - \textbf{I}) \textbf{B}

And that’s the hard part of discretization, the rest of the system is easy:
where, fortunately for us

\textbf{C}_d = \textbf{C},

\textbf{D}_d = \textbf{D}.

This then gives us our discrete system transfer function:

\textbf{F}(z) = \frac{\textbf{Y}_d(z)}{\textbf{U}_d(z)} = \textbf{C}_d(z\textbf{I} - \textbf{A}_d)^{-1} \textbf{B}_d + \textbf{D}_d.

NEF modeling with continuous lowpass filter dynamics

Now that we know how to discretize our system, we can look at compensating for the lowpass filter dynamics in discrete time. The equation for the discrete time lowpass filter is

H(z) = \frac{1-a}{z-a},

where a = \textrm{e}^{-\frac{dt}{\tau}}.

Plugging that in to the discrete transfer fuction gets us

\textbf{F}(H(z)) = \textbf{C}_d(H(z)^{-1}\textbf{I} - \textbf{A}_d)^{-1} \textbf{B}_d + \textbf{D}_d.

\textbf{F}(H(z)) = \textbf{C}_d(\frac{z-a}{1-a}\textbf{I} - \textbf{A}_d)^{-1} \textbf{B}_d + \textbf{D}_d,

\textbf{F}(H(z)) = \textbf{C}_d(\frac{z\textbf{I}}{1-a} - \frac{a\textbf{I} - (1-a)\textbf{A}_d}{1-a})^{-1} \textbf{B}_d + \textbf{D}_d,

\textbf{F}(H(z)) = \textbf{C}_d(z\textbf{I} - a\textbf{I} - (1-a)\textbf{A}_d)^{-1} (1-a)\textbf{B}_d + \textbf{D}_d.

and we see that if we choose

\textbf{A}'_d = \frac{1}{1-a}(\textbf{A} + a\textbf{I}),

\textbf{B}'_d = \frac{1}{1-a}\textbf{B},

then we get

\textbf{F}'(H(z)) = \textbf{C}_d(z\textbf{I} - a\textbf{I} - (1-a)\textbf{A}_d')^{-1} (1-a)\textbf{B}_d' + \textbf{D}_d,

= \textbf{C}_d(z\textbf{I} - a\textbf{I} - (1-a)(\frac{1}{1-a}(\textbf{A} + a\textbf{I})))^{-1} (1-a)(\frac{1}{1-a}\textbf{B}) + \textbf{D}_d,

= \textbf{C}_d(z\textbf{I} - a\textbf{I} - \textbf{A} + a\textbf{I})^{-1} \textbf{B}) + \textbf{D}_d,

= \textbf{C}_d(z\textbf{I} - \textbf{A})^{-1} \textbf{B}) + \textbf{D}_d,

And now congratulations are in order. Proper compensation for the discrete lowpass filter dynamics has finally been achieved!

Point attractor example

What difference does this actually make in modelling? Well, everyone likes examples, so let’s have one.

Here are the dynamics for a second-order point attractor system:

\ddot{x} = \alpha(\beta(x^* - x) - \dot{x})

with x, \dot{x}, and \ddot{x} being the system position, velocity, and acceleration, respectively, x^* is the target position, and \alpha, and \beta are gain values. So the acceleration is just going to be set such that it drives the system towards the target position, and compensates for velocity.

Converting this from a second order system to a first order system we have

\left [ \begin{array}{c} \dot{x} \\ \ddot{x} \end{array} \right ] = \left [ \begin{array}{cc}0 & 1 \\ -\alpha\beta & -\alpha \end{array} \right] \left [ \begin{array}{c} x \\ \dot{x} \end{array} \right ] + \left [ \begin{array}{c}0 \\ \alpha\beta \end{array} \right] \left [ \begin{array}{c} 0 \\ x^* \end{array} \right ]

which we’ll rewrite compactly as

\dot{\textbf{x}} = \textbf{A} \textbf{x} + \textbf{B} \textbf{u}

OK, we’ve got our state space equation of the dynamical system we want to implement.

Given a simulation time step dt, we’ll first calculate the discrete state matrices:

\textbf{A}_d = \textrm{e}^{\textbf{A}dt},

\textbf{B}_d = \textbf{A}^{-1} (\textrm{e}^{\textbf{A}dt} - \textbf{I})\textbf{B}.

Great! Easy. Now we can calculate the state matrices that will compensate for the discrete lowpass filter:

\textbf{A}_d' = \frac{1}{1-a}(\textbf{A}_d + a\textbf{I}),

\textbf{B}_d' = \frac{1}{1-a}\textbf{B}_d,

where a = \textrm{e}^{-\frac{dt}{\tau}}.

Alright! So that’s our system now, a basic point attractor implementation in Nengo 2.3 looks like this:

tau = 0.1  # synaptic time constant

# the A matrix for our point attractor
A = np.array([[0.0, 1.0],
              [-alpha*beta, -alpha]])

# the B matrix for our point attractor
B = np.array([[0.0], [alpha*beta]])

# account for discrete lowpass filter
a = np.exp(-dt/tau)
if analog:
    A = tau * A + np.eye(2)
    B = tau * B
else:
    # discretize
    Ad = expm(A*dt)
    Bd = np.dot(np.linalg.inv(A), np.dot((Ad - np.eye(2)), B))
    A = 1.0 / (1.0 - a) * (Ad - a * np.eye(2))
    B = 1.0 / (1.0 - a) * Bd

net = nengo.Network(label='Point Attractor')
net.config[nengo.Connection].synapse = nengo.Lowpass(tau)

with config, net:
    net.ydy = nengo.Ensemble(n_neurons=n_neurons, dimensions=2,
        # set it up so neurons are tuned to one dimensions only
        encoders=nengo.dists.Choice([[1, 0], [-1, 0], [0, 1], [0, -1]]))
    # set up Ax part of point attractor
    nengo.Connection(net.ydy, net.ydy, transform=A)

    # hook up input
    net.input = nengo.Node(size_in=1, size_out=1)
    # set up Bu part of point attractor
    nengo.Connection(net.input, net.ydy, transform=B)

    # hook up output
    net.output = nengo.Node(size_in=1, size_out=1)
    # add in forcing function
    nengo.Connection(net.ydy[0], net.output, synapse=None)

Note that for calculating Ad we’re using expm which is the matrix exp function from scipy.linalg package. The numpy.exp does an elementwise exp, which is definitely not what we want here, and you will get some confusing bugs if you’re not careful.

Code for implementing and also testing under some different gains is up on my GitHub, and generates the following plots for dt=0.001:

figure_1-24dt=1e-3

In the above results you can see that the when the gains are low, and thus the system dynamics are slower, that you can’t really tell a difference between the continuous and discrete filter compensation. But! As you get larger gains and faster dynamics, the resulting effects become much more visible.

If you’re building your own system, then I also recommend using the ss2sim function from Aaron’s nengolib library. It automatically handles compensation for any synapses and generates the matrices that account for discrete or continuous implementations automatically. Using the library looks like:

tau = 0.1  # synaptic time constant
synapse = nengo.Lowpass(tau)

# the A matrix for our point attractor
A = np.array([[0.0, 1.0],
              [-alpha*beta, -alpha]])

# the B matrix for our point attractor
B = np.array([[0.0], [alpha*beta]])

from nengolib.synapses import ss2sim
C = np.eye(2)
D = np.zeros((2, 2))
linsys = ss2sim((A, B, C, D),
                synapse=synapse,
                dt=None if analog else dt)
A = linsys.A
B = linsys.B

Conclusions

So there you are! Go forward and model free of error introduced by improperly accounting for discrete simulation! If, like me, you’re doing anything with neural modelling and motor control (i.e. systems with very quickly evolving dynamics), then hopefully you’ve found all this work particularly interesting, as I did.

There’s a ton of extensions and different directions that this work can be and has already been taken, with a bunch of really neat systems developed using this more accurate accounting for synaptic filtering as a base. You can read up on this and applications to modelling time delays and time cells and lots lots more up on Aaron’s GitHub, and hisrecent papers, which are listed on his lab webpage.

Tagged , , , , ,

Quick calculations with SymPy and Cython

Alrighty! Last time I posted about SymPy we weren’t worrying too much about computation speed. It was worth the time saved by automating the Jacobian, inertia matrix, etc calculations and it didn’t affect simulation speed really all that much. But! When working with a real arm it suddenly becomes critical to have highly efficient calculations.

Turns out that I still don’t want to code those equations by hand. There are probably very nice options for generating optimized code using Mathematica or MapleSim or whatever pay software, but SymPy is free and already Python compatible so my goal is to stick with that.

Options

I did some internet scouring, and looked at installing various packages to help speed things up, including

  1. trying out the Sympy simplify function,
  2. trying out SymEngine,
  3. trying out the Sympy compile to Theano function,
  4. trying out the Sympy autowrap function, and
  5. various combinations of the above.

The SymEngine backend and Theano functions really didn’t give any improvements for the kind of low-dimensional vector calculations performed for control here. Disclaimer, it could be the case that I gave up on these implementations too quickly, but from some basic testing it didn’t seem like they were the way to go for this kind of problem.

So down to the simplify function and compiling to the Cython backend options. First thing you’ll notice when using the simplify is that the generation time for the function can take orders of magnitude longer (up to 12ish hours for inertia matrices for the Kinova Jaco 2 arm with simplify vs about 1-2 minutes without simplify, for example). And then, as a nice kick in the teeth after that, there’s almost no difference between straight Cython of the non-simplified functions and the simplified functions. Here are some graphs:

So you can see how the addition of the simplify drastically increases the compile time in the top half of the graphs there. In the lower half, you can see that the simplify does save some execution time relative to the baseline straight lambdify function, but once you start compiling to Cython with the autowrap the difference is on the order of hundredths to thousandths of milliseconds.

Results

So! My recommendation is

  • Don’t use simplify,
  • do use autowrap with the Cython backend.

To compile using the Cython backend:

from sympy.utilities.autowrap import autowrap
function = autowrap(sympy_expression, backend="cython",
                    args=parameters)

In the last Sympy post I looked at a hard-coded calculation against the Sympy generated function, using the timeit function, which runs a given chunk of code 1,000,000 times and tells you how long it took. To save tracking down the results from that post, here are the timeit results of a Sympy function generated using just the lambdify function vs the hard-coding the function in straight Numpy:

simtime

And now using the autowrap function to compile to Cython code, compared to hard-coding the function in Numpy:

cython_vs_hardcoded

This is a pretty crazy improvement, and means that we can have the best of both worlds. We can declare our transforms in SymPy and automate the calculation of our Jacobians and inertia matrices, and still have the code execute fast enough that we can use it to control real robots.

That said, this is all from my basic experimenting with some different optimisations, which is a far from thorough exploration of the space. If you know of any other ways to speed up execution, please comment below!

You can find the code I used to generate the above plots up on my GitHub.

Tagged , , , , ,

Full body obstacle collision avoidance

Previously I’ve discussed how to avoid obstacles using DMPs in the end-effector trajectory. This is good when you’re controlling a single disconnected point-mass, like a mobile robot navigating around an environment. But if you want to use this system to control a robotic manipulator, then pretty quickly you run into the problem that your end-effector is not a disconnected point-mass moving around the environment. It’s attached to the rest of the arm, and moving such that the arm segments and joints also avoid the obstacle is a whole other deal.

I was doing a quick lit scan on methods for control methods for avoiding joint collision with obstacles, and was kind of surprised that there wasn’t really much in the realm of recent discussions about it. There is, however, a 1986 paper from Dr. Oussama Khatib titled Real-time obstacle avoidance for manipulators and mobile robots that pretty much solves this problem short of getting into explicit path planning methods. Which could be why there aren’t papers since then about it. All the same, I couldn’t find any implementations of it online, and there are a few tricky parts, so in this post I’m going to work through the implementation.

Note: The implementation that I’ve worked out here uses spheres to represent the obstacles. This works pretty well in general by just making the sphere large enough to cover whatever obstacle you’re avoiding. But if you want a more precise control around other shapes, you can check out the appendix of Dr. Khatib’s paper, where he works provides the math for cubes and cones as well.

Note: You can find all of the code I use here and everything you need for the VREP implementation up on my GitHub. I’ve precalculated the functions that are required, because generating them is quite computationally intensive. Hopefully the file saving doesn’t get weird between different operating systems, but if it does, try deleting all of the files in the ur5_config folder and running the code again. This will regenerate those files (on my laptop this takes ~4 hours, though, so beware).

The general idea

Since it’s from Dr. Khatib, you might expect that this approach involves task space. And indeed, your possible suspicions are correct! The algorithm is going to work by identifying forces in Cartesian coordinates that will move any point of the arm that’s too close to an obstacle away from it. The algorithm follows the form:

Setup

  • Specify obstacle location and size
  • Specify a threshold distance to the obstacle

While running

  • Find the closest point of each arm segment to obstacles
  • If within threshold of obstacle, generate force to apply to that point
  • Transform this force into joint torques
  • Add directly to the outgoing control signal

Something that you might notice about this is that it’s similar to the addition of the null space controller that we’ve seen before in operational space control. There’s a distinct difference here though, in that the control signal for avoiding obstacles is added directly to the outgoing control signal, and that it’s not filtered (like the null space control signal) such that there’s no guarantee that it won’t affect the movement of the end-effector. In fact, it’s very likely to affect the movement of the end-effector, but that’s also desirable, as not ramming the arm through an obstacle is as important as getting to the target.

OK, let’s walk through these steps one at a time.

Setup

I mentioned that we’re going to treat all of our obstacles as spheres. It’s actually not much harder to do these calculations for cubes too, but this is already going to be long enough so I’m only addressing sphere’s here. This algorithm assumes we have a list of every obstacle’s centre-point and radius.

We want the avoidance response of the system to be a function of the distance to the obstacle, such that the closer the arm is to the obstacle the stronger the response. The function that Dr. Khatib provides is of the following form:

\textbf{F}_{psp} = \left\{ \begin{array}{cc}\eta (\frac{1.0}{\rho} - \frac{1}{\rho_0}) \frac{1}{\rho^2} \frac{\partial \rho}{\partial \textbf{x}} & \rho \leq \rho_0 \\ \textbf{0} & \rho > \rho_0 \end{array} \right. ,

where \rho is the distance to the target, \rho_0 is the threshold distance to the target at which point the avoidance function activates, \frac{\partial \rho}{\partial \textbf{x}} is the partial derivative of the distance to the target with respect to a given point on the arm, and \eta is a gain term.

This function looks complicated, but it’s actually pretty intuitive. The partial derivative term in the function works simply to point in the opposite direction of the obstacle, in Cartesian coordinates, i.e. tells the system how to get away from the obstacle. The rest of the term is just a gain that starts out at zero when \rho = \rho_0, and gets huge as the obstacle nears the object (as \rho \to 0 \Rightarrow \frac{1}{\rho} \to \infty). Using \eta = .2 and \rho_0 = .2 gives us a function that looks like

gain

So you can see that very quickly a very, very strong push away from this obstacle is going to be generated once we enter the threshold distance. But how do we know exactly when we’ve entered the threshold distance?

Find the closest point

We want to avoid the obstacle with our whole body, but it turns out we can reduce the problem to only worrying about the closest point of each arm segment to the obstacle, and move that one point away from the obstacle if threshold distance is hit.

To find the closest point on a given arm segment to the obstacle we’ll do some pretty neat trig. I’ll post the code for it and then discuss it below. In this snippet, p1 and p2 are the beginning and ending (x,y,z) locations of arm segment (which we are assuming is a straight line), and v is the center of the obstacle.

# the vector of our line
vec_line = p2 - p1
# the vector from the obstacle to the first line point
vec_ob_line = v - p1
# calculate the projection normalized by length of arm segment
projection = (np.dot(vec_ob_line, vec_line) /
              np.dot(vec_line, vec_line))
if projection < 0:               
    # then closest point is the start of the segment
    closest = p1  
elif projection > 1:
    # then closest point is the end of the segment
    closest = p2
else:
    closest = p1 + projection * vec_line

The first thing we do is find the arm segment line, and then line from the obstacle center to the start point of the arm segment. Once we have these, we do:

\frac{\textbf{v}_\textrm{ob\_line} \; \cdot \; \textbf{v}_\textrm{line}}{\textbf{v}_\textrm{line} \; \cdot \; \textbf{v}_\textrm{line}},

using the geometric definition of the dot product two vectors, we can rewrite the above as

\frac{||\textbf{v}_\textrm{ob\_line}|| \; || \textbf{v}_\textrm{line} || \; \textrm{cos}(\theta)}{||\textbf{v}_\textrm{line}||^2} = \frac{||\textbf{v}_\textrm{ob\_line}||} {||\textbf{v}_\textrm{line}||} \textrm{cos}(\theta)

which reads as the magnitude of vec_ob_line divided by the magnitude of vec_line (I know, these are terrible names, sorry) multiplied by the angle between the two vectors. If the angle between the vectors is < 0 (projection will also be < 0), then right off the bat we know that the start of the arm segment, p1, is the closest point. If the projection value is > 1, then we know that 1) the length from the start of the arm segment to the obstacle is longer than the length of the arm, and 2) the angle is such that the end of the arm segment, p2, is the closest point to the obstacle.

Finally, in the last case we know that the closest point is somewhere along the arm segment. To find where exactly, we do the following

\textbf{p}_1 + \textrm{projection} \; \textbf{v}_\textrm{line},

which can be rewritten

\textbf{p}_1 + \frac{||\textbf{v}_\textrm{ob\_line}||} {||\textbf{v}_\textrm{line}||} \textrm{cos}(\theta) \; \textbf{v}_\textrm{line},

I find it more intuitive if we rearrange the second term to be

\textbf{p}_1 + \frac{\textbf{v}_\textrm{line}} {||\textbf{v}_\textrm{line}||} \; ||\textbf{v}_\textrm{ob\_line} || \; \textrm{cos}(\theta).

So then what this is all doing is starting us off at the beginning of the arm segment, p1, and to that we add this other fun term. The first part of this fun term provides direction normalized to magnitude 1. The second part of this term provides magnitude, specifically the exact distance along vec_line we need to traverse to form reach a right angle (giving us the shortest distance) pointing at the obstacle. This handy figure from the Wikipedia page helps illustrate exactly what’s going on with the second part, where B is be vec_line and A is vec_ob_line:

dot_product
Armed with this information, we understand how to find the closest point of each arm segment to the obstacle, and we are now ready to generate a force to move the arm in the opposite direction!

Check distance, generate force

To calculate the distance, all we have to do is calculate the Euclidean distance from the closest point of the arm segment to the center of the sphere, and then subtract out the radius of the sphere:

# calculate distance from obstacle vertex to the closest point
dist = np.sqrt(np.sum((v - closest)**2))
# account for size of obstacle
rho = dist - obstacle[3]

Once we have this, we can check it and generate F_{psp} using the equation we defined above. The one part of that equation that wasn’t specified was exactly what \frac{\partial \rho}{\partial \textbf{x}} was. Since it’s just the partial derivative of the distance to the target with respect to the closest point, we can calculate it as the normalized difference between the two points:

drhodx = (v - closest) / rho

Alright! Now we’ve found the closest point, and know the force we want to apply, from here it’s standard operational space procedure.

Transform the force into torques

As we all remember, the equation for transforming a control signal from operational space to involves two terms aside from the desired force. Namely, the Jacobian and the operational space inertia matrix:

\textbf{u}_\textrm{psp} = \textbf{J}^T_{psp} \textbf{M}_{psp} \textbf{F}_{psp},

where \textbf{J}_{psp} is the Jacobian for the point of interest, \textbf{M}_{psp} is the operational space inertia matrix for the point of interest, and \textbf{F}_{psp} is the force we defined above.

Calculating the Jacobian for an unspecified point

So the first thing we need to calculate is the Jacobian for this point on the arm. There are a bunch of ways you could go about this, but the way I’m going to do it here is by building on the post where I used SymPy to automate the Jacobian derivation. The way we did that was by defining the transforms from the origin reference frame to the first link, from the first link to the second, etc, until we reached the end-effector. Then, whenever we needed a Jacobian we could string together the transforms to get the transform from the origin to that point on the arm, and take the partial derivative with respect to the joints (using SymPy’s derivative method).

As an example, say we wanted to calculate the Jacobian for the third joint, we would first calculate:

^3_{\textrm{org}}\textbf{T} = ^0_{\textrm{org}}\textbf{T} \; ^1_0\textbf{T} \; ^2_1\textbf{T} \; ^3_2\textbf{T},

where ^m_n\textbf{T} reads the transform from reference frame m to reference frame n.

Once we have this transformation matrix, ^3_\textrm{org}\textbf{T}, we multiply it by the point of interest in reference frame 3, which, previously, has usually been \textbf{x} = [0, 0, 0]. In other words, usually we’re just interested in the origin of reference frame 3. So the Jacobian is just

\frac{\partial \; ^3_\textrm{org}\textbf{T} \textbf{x}}{\partial \textbf{q}}.

what if we’re interested in some non-specified point along link 3, though? Well, using SymPy we set make \textbf{x} = [x_0, x_1, x_2, 1] instead of \textbf{x} = [0, 0, 0, 1] (recall the 1 at the end in these vectors is just to make the math work), and make the Jacobian function SymPy generates for us dependent on both \textbf{q} and \textbf{x}, rather than just \textbf{q}. In code this looks like:

Torg3 = self._calc_T(name="3")
# transform x into world coordinates
Torg3x = sp.simplify(Torg3 * sp.Matrix(x))
J3_func = sp.lambdify(q + x, Torg3)

Now it’s possible to calculate the Jacobian for any point along link 3 just by changing the parameters that we pass into J3_func! Most excellent.

We are getting closer.

NOTE: This parameterization can significantly increase the build time of the function, it took my laptop about 4 hours. To decrease build time you can try commenting out the simplify calls from the code, which might slow down run-time a bit but significantly drops the generation time.

Where is the closest point in that link’s reference frame?

A sneaky problem comes up when calculating the closest point of each arm segment to the object: We’ve calculated the closest point of each arm segment in the origin’s frame of reference, and we need thew relative to each link’s own frame of reference. Fortunately, all we have to do is calculate the inverse transform for the link of interest. For example, the inverse transform of ^3_\textrm{org}\textbf{T} transforms a point from the origin’s frame of reference to the reference frame of the 3rd joint.

I go over how to calculate the inverse transform at the end of my post on forward transformation matrices, but to save you from having to go back and look through that, here’s the code to do it:

Torg3 = self._calc_T(name="3")
rotation_inv = Torg3[:3, :3].T
translation_inv = -rotation_inv * Torg3[:3, 3]
Torg3_inv = rotation_inv.row_join(translation_inv).col_join(
    sp.Matrix([[0, 0, 0, 1]]))

And now to find the closest point in the coordinates of reference frame 3 we simply

x = np.dot(Torg3_inv, closest)

This x value is what we’re going to plug in as parameters to our J3_func above to find the Jacobian for the closest point on link 3.

Calculate the operational space inertia matrix for the closest point

OK. With the Jacobian for the point of interest we are now able to calculate the operational space inertia matrix. This code I’ve explicitly worked through before, and I’ll show it in the full code below, so I won’t go over it again here.

The whole implementation

You can run an example of all this code controlling the UR5 arm to avoid obstacles in VREP using this code up on my GitHub. The specific code added to implement obstacle avoidance looks like this:

# find the closest point of each link to the obstacle
for ii in range(robot_config.num_joints):
    # get the start and end-points of the arm segment
    p1 = robot_config.Tx('joint%i' % ii, q=q)
    if ii == robot_config.num_joints - 1:
        p2 = robot_config.Tx('EE', q=q)
    else:
        p2 = robot_config.Tx('joint%i' % (ii + 1), q=q)

    # calculate minimum distance from arm segment to obstacle
    # the vector of our line
    vec_line = p2 - p1
    # the vector from the obstacle to the first line point
    vec_ob_line = v - p1
    # calculate the projection normalized by length of arm segment
    projection = (np.dot(vec_ob_line, vec_line) /
                  np.sum((vec_line)**2))
    if projection < 0:         
        # then closest point is the start of the segment
        closest = p1
    elif projection > 1:
        # then closest point is the end of the segment
        closest = p2
    else:
        closest = p1 + projection * vec_line
    # calculate distance from obstacle vertex to the closest point
    dist = np.sqrt(np.sum((v - closest)**2))
    # account for size of obstacle
    rho = dist - obstacle_radius

    if rho < threshold:
        eta = .02
        drhodx = (v - closest) / rho
        Fpsp = (eta * (1.0/rho - 1.0/threshold) *
                1.0/rho**2 * drhodx)

        # get offset of closest point from link's reference frame
        T_inv = robot_config.T_inv('link%i' % ii, q=q)
        m = np.dot(T_inv, np.hstack([closest, [1]]))[:-1]
        # calculate the Jacobian for this point
        Jpsp = robot_config.J('link%i' % ii, x=m, q=q)[:3]

        # calculate the inertia matrix for the
        # point subjected to the potential space
        Mxpsp_inv = np.dot(Jpsp,
                        np.dot(np.linalg.pinv(Mq), Jpsp.T))
        svd_u, svd_s, svd_v = np.linalg.svd(Mxpsp_inv)
        # cut off singular values that could cause problems
        singularity_thresh = .00025
        for ii in range(len(svd_s)):
            svd_s[ii] = 0 if svd_s[ii] < singularity_thresh else \
                1./float(svd_s[ii])
        # numpy returns U,S,V.T, so have to transpose both here
        Mxpsp = np.dot(svd_v.T, np.dot(np.diag(svd_s), svd_u.T))

        u_psp = -np.dot(Jpsp.T, np.dot(Mxpsp, Fpsp))
        if rho < .01:
            u = u_psp
        else:
            u += u_psp

The one thing in this code I didn’t talk about is that you can see that if rho < .01 then I set u = u_psp instead of just adding u_psp to u. What this does is basically add in a fail safe take over of the robotic control saying that “if we are about to hit the obstacle forget about everything else and get out of the way!”.

Results

And that’s it! I really enjoy how this looks when it’s running, it’s a really effective algorithm. Let’s look at some samples of it in action.

First, in a 2D environment, where it’s real easy to move around the obstacle and see how it changes in response to the new obstacle position. The red circle is the target and the blue circle is the obstacle:
avoid2d

And in 3D in VREP, running the code example that I’ve put up on my GitHub implementing this. The example of it running without obstacle avoidance code is on the left, and running with obstacle avoidance is on the right. It’s kind of hard to see but on the left the robot moves through the far side of the obstacle (the gold sphere) on its way to the target (the red sphere):

And one last example, the arm dodging a moving obstacle on its way to the target.

movingavoid3d

The implementation is a ton of fun to play around with. It’s a really nifty algorithm, that works quite well, and I haven’t found many alternatives in papers that don’t go into path planning (if you know of some and can share that’d be great!). This post was a bit of a journey, but hopefully you found it useful! I continue to find it impressive how many different neat features like this can come about once you have the operational space control framework in place.

Tagged , , , ,

Velocity limiting in operational space control

Recently, I was reading through an older paper on effective operational space control, talking specifically point to point control in operational space. The paper mentioned that even if you have a perfect model of the system, you’re going to run into trouble if you use just a basic PD formula to define your control signal in operational space:

u_x = k_p (\textbf{x}^* - \textbf{x}) - k_v \dot{\textbf{x}},

where \textbf{x} and \dot{\textbf{x}} are the system position and velocity in operational space, \textbf{x}^* is the target position, and k_p and k_v are gains.

If you define your operational space control signal like this, and then translate this signal into joint torques (using, for example, methods discussed in other posts), you’re going to see a very non-straight trajectory emerge in larger movements as a result of “actuator saturation, and bandwidth and velocity limitations”. In the example of a standard robot, you might run into issues with your motors not being able to actually generate the torques that have been specified, the frequency of control and feedback might not be sufficient, and you could hit hard constraints on system velocity. The solution to this problem was presented in this 1987 paper by Dr. Oussama Khatib, and is pretty slick and very useful, so I thought I’d write it up here for any other unfortunate souls wandering around in ignorance. First though, here’s what it looks like to move large point to point distances without velocity limiting:

As you can see, the system isn’t moving in a straight line, which can be very aggravating if you’ve worked and reworked out the equations and double checked all your parameters, etc etc. A few things, first, when working with simulations it’s easy to forget how ridiculously fast this actually would be in real-time. Even though it takes a minute to simulate the above movement, in real-time, is happening over the course of 200ms. Taking that into account, this is pretty good. Also, there’s an obvious solution here, slow down your movement. The source of this problem is largely that all of the motors are not able to apply the torques specified and move at the required speed. Some of the motors have a lot less mass to throw around and will be able to move at the specified torques, but not all. Hence the not straight trajectory.

You can of course drop the gains on your PD signal, but that’s not really a great methodical solution. So, what can we do?

Well, if we rearrange the PD control signal specified above into

u_x = k_v (\dot{\textbf{x}}^* - \dot{\textbf{x}}),

where \dot{\textbf{x}}^* is the desired velocity, we see that this signal can be interpreted as a pure velocity servo-control signal, with velocity gain k_v and a desired velocity

\dot{\textbf{x}}^* = \frac{k_p}{k_v}(\textbf{x}^* - \textbf{x}).

When things are in this form, it becomes a bit more clear what we have to do: limit the desired velocity to be at most some specified maximum velocity of the end-effector, V_\textrm{max}. This value should be low enough that the transformation into joint torques doesn’t result in anything larger than the actuators can generate.

Taking V_\textrm{max}, what we want is to clip the magnitude of the control signal to be V_\textrm{max} if it’s ever larger (in positive or negative directions), and to be equal to \frac{kp}{kv}(\textbf{x}^* - \textbf{x}) otherwise. The math for this works out such that we can accomplish this with a control signal of the form:

\textbf{u}_\textbf{x} = -k_v (\dot{\textbf{x}} + \textrm{sat}\left(\frac{V_\textrm{max}}{\lambda |\tilde{\textbf{x}}|} \right) \lambda \tilde{\textbf{x}}),

where \lambda = \frac{k_p}{k_v} , \tilde{\textbf{x}} = \textbf{x} - \textbf{x}^*, and \textrm{sat} is the saturation function, such that

\textrm{sat}(y) = \left\{ \begin{array}{cc} |y| \leq 1 & \Rightarrow y \\ |y| > 1 & \Rightarrow 1 \end{array} \right.

where |y| is the absolute value of y, and is applied element wise to the vector \tilde{\textbf{x}} in the control signal.

As a result of using this saturation function, the control signal behaves differently depending on whether or not \dot{\textbf{x}}^* > V_\textrm{max}:

\textbf{u}_\textbf{x} = \left\{ \begin{array}{cc} \dot{\textbf{x}}^* \geq V_\textrm{max} & \Rightarrow -k_v (\dot{\textbf{x}} + V_\textbf{max} \textrm{sgn}(\tilde{\textbf{x}})) \\ \dot{\textbf{x}}^* < V_\textrm{max} & \Rightarrow -k_v \dot{\textbf{x}} + k_p \tilde{\textbf{x}} \end{array} \right.

where \textrm{sgn}(y) is a function that returns -1 if y < 0 and 1 if y \geq 0, and is again applied element-wise to vectors. Note that the control signal in the second condition is equivalent to our original PD control signal k_p(\textbf{x}^* - \textbf{x}) - k_v \dot{\textbf{x}}. If you’re wondering about negative signs make sure you note that \tilde{\textbf{x}} = \textbf{x} - \textbf{x}^* and not \textbf{x}^* - \textbf{x}, as you might assume.

So now the control signal is behaving exactly as desired! Moves the system towards the target, but no faster than the specified maximum velocity. Now our trajectories look like this:

So those are starting to look a lot better! The first thing you’ll notice is that this is a fair bit slower of a movement. Well, actually, it’s waaaayyyy slower because the playback speed here is 4x faster than in that first animation, and this is a movement over 2s. Which has pros and cons, con: it’s slower, pro: it’s straighter, and you’re less likely to be murdered by it. When you move from simulations to real systems that latter point really moves way up the priority list.

Second thing to notice, the system seems to be minimising the error along one dimension, and then along the next, and then arrives at the target. What’s going on?  Because the error along each of the (x,y,z) dimensions isn’t the same, when speed gets clipped along one of the dimensions you’re no longer going to be moving in a straight line directly to the target. To address this, we’re going to add a scaling term whenever clipping happens, such that you reduce the speed you move along each dimension by the same ratio, so that you’re still moving in a straight line.

It’s a liiiiittle bit more complicated than that, but not much. First, we’ll calculate the values being passed in to the saturation function for each (x,y,z) dimension. We’ll then check to see if any of them are going to get clipped, and if there’s more than one that saturates we’ll find the one that is affected the most. After we’ve identified which dimension it is, we go through and calculate what the control signal would have been without velocity limiting, and what it will be now with velocity limiting. This scaling term tells us how much the control signal was reduced, and we can then use it to reduce the control signals of the other dimensions by the same amount. These other dimensions might still saturate, though, so we have to recalculate the saturation function for them once they’ve been scaled. Here’s what this all looks like in code:

# implement velocity limiting
lamb = kp / kv
x_tilde = xyz - target_xyz
sat = vmax / (lamb * np.abs(x_tilde))
scale = np.ones(3)
if np.any(sat < 1):
    index = np.argmin(sat)
    unclipped = kp * x_tilde[index]
    clipped = kv * vmax * np.sign(x_tilde[index])
    scale = np.ones(3) * clipped / unclipped
    scale[index] = 1
u_xyz = -kv * (dx + np.clip(sat / scale, 0, 1) *
               scale * lamb * x_tilde)
 

And now, finally, we start getting the trajectories that we’ve been wanting the whole time:

And finally we can rest easy, knowing that our robot is moving at a reasonable speed along a direct path to its goals. Wherever you’d like to use this neato ‘ish you should be able to just paste in the above code, define your vmax, kp, and kv values and be good to go!

Tagged , , , ,

Using SymPy’s lambdify for generating transform matrices and Jacobians

I’ve been working in VREP with some of their different robot models and testing out force control, and one of the things that becomes pretty important for efficient workflow is to have a streamlined method for setting up the transform matrices and calculating the Jacobians for different robots. You do not want to be working these all out by hand and then writing them in yourself.

A solution that’s been working well for me (and is fully implemented in Python) is to use SymPy to set up the basic transform matrices, from each joint to the next, and then use it’s derivative function to calculate the Jacobian. Once this is calculated you can then use SymPy’s lambdify function to parameterize this, and off you go! In this post I’m going to work through an example for controlling VREP’s UR5 arm using force control. And if you’re just looking for code examples, you can find it all up on my GitHub.

Edit: Updated the code to be much nicer, added saving of calculated functions, and a null space controller.
Edit 2: Removed the simplify calls from the code, not worth the generation time increase (if execution is a great concern can generate the functions using cython).

Setting up the transform matrices

This is the part of this process that is unique to each arm. The goal is to set up a system so that you can specify your transforms for each joint (and to each centre-of-mass (COM) too, of course) and then be on your way. So here’s the UR5, cute thing that it is:

ur5.png

For the UR5, there are 6 joints, and we’re going to be specifying 9 transform matrices: 6 joints and the COM for three arm segments (the remaining arm segment COMs are centered at their respective joint’s reference frame). The joints are all rotary, with 0 and 4 rotating around the z-axis, and the rest all rotating around x.

So first, we’re going to create a class called robot_config. Then, to create our transform matrices in SymPy first we need to set up the variables we’re going to be using:

# set up our joint angle symbols (6th angle doesn't affect any kinematics)
self.q = [sp.Symbol('q%i'%ii) for ii in range(6)]
# segment lengths associated with each joint
self.L = np.array([0.0935, .13453, .4251, .12, .3921, .0935, .0935, .0935])

where self.q is an array storing all our joint angle symbols, and self.L is an array of all of the offsets for each joint and arm segment lengths.

Using these to create the transform matrices for the joints, we get a set up that looks like this:

# transform matrix from origin to joint 0 reference frame
self.T0org = sp.Matrix([[sp.cos(self.q[0]), -sp.sin(self.q[0]), 0, 0],
                        [sp.sin(self.q[0]), sp.cos(self.q[0]), 0, 0],
                        [0, 0, 1, self.L[0]],
                        [0, 0, 0, 1]])

# transform matrix from joint 0 to joint 1 reference frame
self.T10 = sp.Matrix([[1, 0, 0, -L[1]],
                      [0, sp.cos(-self.q[1] + sp.pi/2),
                       -sp.sin(-self.q[1] + sp.pi/2), 0],
                      [0, sp.sin(-self.q[1] + sp.pi/2),
                       sp.cos(-self.q[1] + sp.pi/2), 0],
                      [0, 0, 0, 1]])

# transform matrix from joint 1 to joint 2 reference frame
self.T21 = sp.Matrix([[1, 0, 0, 0],
                      [0, sp.cos(-self.q[2]),
                       -sp.sin(-self.q[2]), self.L[2]],
                      [0, sp.sin(-self.q[2]),
                       sp.cos(-self.q[2]), 0],
                      [0, 0, 0, 1]])

# transform matrix from joint 2 to joint 3
self.T32 = sp.Matrix([[1, 0, 0, L[3]],
                      [0, sp.cos(-self.q[3] - sp.pi/2),
                       -sp.sin(-self.q[3] - sp.pi/2), self.L[4]],
                      [0, sp.sin(-self.q[3] - sp.pi/2),
                       sp.cos(-self.q[3] - sp.pi/2), 0],
                      [0, 0, 0, 1]])

# transform matrix from joint 3 to joint 4
self.T43 = sp.Matrix([[sp.sin(-self.q[4] - sp.pi/2),
                       sp.cos(-self.q[4] - sp.pi/2), 0, -self.L[5]],
                      [sp.cos(-self.q[4] - sp.pi/2),
                       -sp.sin(-self.q[4] - sp.pi/2), 0, 0],
                      [0, 0, 1, 0],
                      [0, 0, 0, 1]])

# transform matrix from joint 4 to joint 5
self.T54 = sp.Matrix([[1, 0, 0, 0],
                      [0, sp.cos(self.q[5]), -sp.sin(self.q[5]), 0],
                      [0, sp.sin(self.q[5]), sp.cos(self.q[5]), self.L[6]],
                      [0, 0, 0, 1]])

# transform matrix from joint 5 to end-effector
self.TEE5 = sp.Matrix([[1, 0, 0, self.L[7]],
                       [0, 1, 0, 0],
                       [0, 0, 1, 0],
                       [0, 0, 0, 1]])

You can see a bunch of offsetting of the joint angles by -sp.pi/2 and this is to account for the expected 0 angle (straight along the reference frame’s x-axis) at those joints being different than the 0 angle defined in the VREP simulation (at a 90 degrees from the x-axis). You can find these by either looking at and finding the joints’ 0 position in VREP or by trial-and-error empirical analysis.

Once you have your transforms, then you have to specify how to move from the origin to each reference frame of interest (the joints and COMs). For that, I’ve just set up a simple function with a switch statement:

# point of interest in the reference frame (right at the origin)
self.x = sp.Matrix([0,0,0,1])

def _calc_T(self, name, lambdify=True):
    """ Uses Sympy to generate the transform for a joint or link

    name string: name of the joint or link, or end-effector
    lambdify boolean: if True returns a function to calculate
                      the transform. If False returns the Sympy
                      matrix
    """

    # check to see if we have our transformation saved in file
    if os.path.isfile('%s/%s.T' % (self.config_folder, name)):
        Tx = cloudpickle.load(open('%s/%s.T' % (self.config_folder, name),
                                   'rb'))
    else:
        if name == 'joint0' or name == 'link0':
            T = self.T0org
        elif name == 'joint1' or name == 'link1':
            T = self.T0org * self.T10
        elif name == 'joint2':
            T = self.T0org * self.T10 * self.T21
        elif name == 'link2':
            T = self.T0org * self.T10 * self.Tl21
        elif name == 'joint3':
            T = self.T0org * self.T10 * self.T21 * self.T32
        elif name == 'link3':
            T = self.T0org * self.T10 * self.T21 * self.Tl32
        elif name == 'joint4' or name == 'link4':
            T = self.T0org * self.T10 * self.T21 * self.T32 * self.T43
        elif name == 'joint5' or name == 'link5':
            T = self.T0org * self.T10 * self.T21 * self.T32 * self.T43 * \
                self.T54
        elif name == 'link6' or name == 'EE':
            T = self.T0org * self.T10 * self.T21 * self.T32 * self.T43 * \
                self.T54 * self.TEE5
        Tx = T * self.x  # to convert from transform matrix to (x,y,z)

        # save to file
        cloudpickle.dump(Tx, open('%s/%s.T' % (self.config_folder, name),
                                  'wb'))

    if lambdify is False:
        return Tx
    return sp.lambdify(self.q, Tx)

So the first part is pretty straight forward, create the transform matrix, and then at the end to get the (x,y,z) position we just multiply by a vector we created that represents a point at the origin of the last reference frame. Some of the transform matrices (the ones to the arm segments) I didn’t specify above just to cut down on space.

The second part is where we use this awesome lambify function, which lets us turn the matrix we’ve defined into a function, so that we can pass in joint angles and get back out the resulting (x,y,z) position. There’s also the option to get the straight up SymPy matrix return, in case you need the symbolic form (which we will!).

NOTE: You can also see that there’s a check built in to look for saved files, and to just load those saved files instead of recalculating things if they’re available. This is because calculating some of these matrices and their derivatives takes a long, long time. I used the cloudpickle module to do this because it’s able to easily handle saving a whole bunch of weird things that makes normal pickle sour.

Calculating the Jacobian

So now that we’re able to quickly generate the transform matrix for each point of interest on the UR5, we simply take the derivative of the equation for each (x,y,z) coordinate with respect to each joint angle to generate our Jacobian.

def _calc_J(self, name, lambdify=True):
    """ Uses Sympy to generate the Jacobian for a joint or link

    name string: name of the joint or link, or end-effector
    lambdify boolean: if True returns a function to calculate
                      the Jacobian. If False returns the Sympy
                      matrix
    """

    # check to see if we have our Jacobian saved in file
    if os.path.isfile('%s/%s.J' % (self.config_folder, name)):
        J = cloudpickle.load(open('%s/%s.J' %
                             (self.config_folder, name), 'rb'))
    else:
        Tx = self._calc_T(name, lambdify=False)
        J = []
        # calculate derivative of (x,y,z) wrt to each joint
        for ii in range(self.num_joints):
            J.append([])
            J[ii].append(Tx[0].diff(self.q[ii]))  # dx/dq[ii]
            J[ii].append(Tx[1].diff(self.q[ii]))  # dy/dq[ii]
            J[ii].append(Tx[2].diff(self.q[ii]))  # dz/dq[ii]

Here we retrieve the Tx vector from our _calc_T function, and then calculate the derivatives. When calculating the Jacobian for the end-effector, this is all we need! Huzzah!

But to calculate the Jacobian for transforming the inertia matrices of each arm segment into joint space we’re going to need the orientation information added to our Jacobian as well. This we know ahead of time, for each joint it’s a 3D vector with a 1 on the axis being rotated around. So we can predefine this:

# orientation part of the Jacobian (compensating for orientations)
self.J_orientation = [
    [0, 0, 1], # joint 0 rotates around z axis
    [1, 0, 0], # joint 1 rotates around x axis
    [1, 0, 0], # joint 2 rotates around x axis
    [1, 0, 0], # joint 3 rotates around x axis
    [0, 0, 1], # joint 4 rotates around z axis
    [1, 0, 0]] # joint 5 rotates around x axis

And then we just fill in the Jacobians for each reference frame with self.J_orientation up to the last joint, and fill in the rest of the Jacobian with zeros. So e.g. when we’re calculating the Jacobian for the arm segment just past the second joint we’ll use the first two rows of self.J_orientation and the rest of the rows will be 0.

So this leads us to the second half of the _calc_J function:

        end_point = name.strip('link').strip('joint')
        if end_point != 'EE':
            end_point = min(int(end_point) + 1, self.num_joints)
            # add on the orientation information up to the last joint
            for ii in range(end_point):
                J[ii] = J[ii] + self.J_orientation[ii]
            # fill in the rest of the joints orientation info with 0
            for ii in range(end_point, self.num_joints):
                J[ii] = J[ii] + [0, 0, 0]

        # save to file
        cloudpickle.dump(J, open('%s/%s.J' %
                                 (self.config_folder, name), 'wb'))

    J = sp.Matrix(J).T  # correct the orientation of J
    if lambdify is False:
        return J
    return sp.lambdify(self.q, J)

The orientation information is added in, we save the result to file, and a function that takes in the joint angles and outputs the Jacobian is created (unless lambdify == False in which case the SymPy symbolic form is returned.)

Then finally, two wrapper functions are added in to make creating / accessing these functions easier. First, define a couple of dictionaries

# create function dictionaries
self._T = {}  # for transform calculations
self._J = {}  # for Jacobian calculations

and then our wrapper functions look like this

def T(self, name, q):
    """ Calculates the transform for a joint or link
    name string: name of the joint or link, or end-effector
    q np.array: joint angles
    """
    # check for function in dictionary
    if self._T.get(name, None) is None:
        print('Generating transform function for %s'%name)
        self._T[name] = self.calc_T(name)
    return self._T[name](*q)[:-1].flatten()

def J(self, name, q):
   """ Calculates the transform for a joint or link
   name string: name of the joint or link, or end-effector
   q np.array: joint angles
   """
   # check for function in dictionary
   if self._J.get(name, None) is None:
        print('Generating Jacobian function for %s'%name)
        self._J[name] = self.calc_J(name)
   return np.array(self._J[name](*q)).T

So how you use this class (all of this is in a class) is to call these T and J functions with the current joint angles. They’ll check to see if the functions have already be created or stored in file, if they haven’t then the T and / or J functions will be created, then our wrappers do a bit of formatting to get them into the proper form (i.e. transposing or cropping), and return you your (x,y,z) or Jacobian!

NOTE: It’s a bit of a misnomer to have the function be called T and actually return to you Tx, but hey this is a blog. Lay off.

Calculating the inertia matrix in joint-space and force of gravity
Now, since we’re here we might as well also calculate the functions for our inertia matrix in joint space and the effect of gravity. So, define a couple more placeholders in our robot_config class to help us:

self._M = []  # placeholder for (x,y,z) inertia matrices
self._Mq = None  # placeholder for joint space inertia matrix function
self._Mq_g = None  # placeholder for joint space gravity term function

and then add in our inertia matrix information (defined in each link’s centre-of-mass (COM) reference frame)

# create the inertia matrices for each link of the ur5
self._M.append(np.diag([1.0, 1.0, 1.0,
                        0.02, 0.02, 0.02]))  # link0
self._M.append(np.diag([2.5, 2.5, 2.5,
                        0.04, 0.04, 0.04]))  # link1
self._M.append(np.diag([5.7, 5.7, 5.7,
                        0.06, 0.06, 0.04]))  # link2
self._M.append(np.diag([3.9, 3.9, 3.9,
                        0.055, 0.055, 0.04]))  # link3
self._M.append(np.copy(self._M[1]))  # link4
self._M.append(np.copy(self._M[1]))  # link5
self._M.append(np.diag([0.7, 0.7, 0.7,
                        0.01, 0.01, 0.01]))  # link6

and then using our equations for calculating the system’s inertia and gravity we create our _calc_Mq and _calc_Mq_g functions

def _calc_Mq(self, lambdify=True):
    """ Uses Sympy to generate the inertia matrix in
    joint space for the ur5

    lambdify boolean: if True returns a function to calculate
                      the Jacobian. If False returns the Sympy
                      matrix
    """

    # check to see if we have our inertia matrix saved in file
    if os.path.isfile('%s/Mq' % self.config_folder):
        Mq = cloudpickle.load(open('%s/Mq' % self.config_folder, 'rb'))
    else:
        # get the Jacobians for each link's COM
        J = [self._calc_J('link%s' % ii, lambdify=False)
             for ii in range(self.num_links)]

        # transform each inertia matrix into joint space
        # sum together the effects of arm segments' inertia on each motor
        Mq = sp.zeros(self.num_joints)
        for ii in range(self.num_links):
            Mq += J[ii].T * self._M[ii] * J[ii]

        # save to file
        cloudpickle.dump(Mq, open('%s/Mq' % self.config_folder, 'wb'))

    if lambdify is False:
        return Mq
    return sp.lambdify(self.q, Mq)

def _calc_Mq_g(self, lambdify=True):
    """ Uses Sympy to generate the force of gravity in
    joint space for the ur5

    lambdify boolean: if True returns a function to calculate
                      the Jacobian. If False returns the Sympy
                      matrix
    """

    # check to see if we have our gravity term saved in file
    if os.path.isfile('%s/Mq_g' % self.config_folder):
        Mq_g = cloudpickle.load(open('%s/Mq_g' % self.config_folder,
                                     'rb'))
    else:
        # get the Jacobians for each link's COM
        J = [self._calc_J('link%s' % ii, lambdify=False)
             for ii in range(self.num_links)]

        # transform each inertia matrix into joint space and
        # sum together the effects of arm segments' inertia on each motor
        Mq_g = sp.zeros(self.num_joints, 1)
        for ii in range(self.num_joints):
            Mq_g += J[ii].T * self._M[ii] * self.gravity

        # save to file
        cloudpickle.dump(Mq_g, open('%s/Mq_g' % self.config_folder,
                                    'wb'))

    if lambdify is False:
        return Mq_g
    return sp.lambdify(self.q, Mq_g)

and wrapper functions

def Mq(self, q):
    """ Calculates the joint space inertia matrix for the ur5

    q np.array: joint angles
    """
    # check for function in dictionary
    if self._Mq is None:
        print('Generating inertia matrix function')
        self._Mq = self._calc_Mq()
    return np.array(self._Mq(*q))

def Mq_g(self, q):
    """ Calculates the force of gravity in joint space for the ur5

    q np.array: joint angles
    """
    # check for function in dictionary
    if self._Mq_g is None:
        print('Generating gravity effects function')
        self._Mq_g = self._calc_Mq_g()
    return np.array(self._Mq_g(*q)).flatten()

and we’re all set!

Putting it all together

Now we have nice clean code to generate everything we need for our controller. Using the controller developed in this post as a base, we can replace those calculations with the following nice compact code (which also includes a secondary null-space controller to keep the arm near resting joint angles):

# calculate position of the end-effector
# derived in the ur5 calc_TnJ class
xyz = robot_config.T('EE', q)
# calculate the Jacobian for the end effector
JEE = robot_config.J('EE', q)
# calculate the inertia matrix in joint space
Mq = robot_config.Mq(q)
# calculate the effect of gravity in joint space
Mq_g = robot_config.Mq_g(q)

# convert the mass compensation into end effector space
Mx_inv = np.dot(JEE, np.dot(np.linalg.inv(Mq), JEE.T))
svd_u, svd_s, svd_v = np.linalg.svd(Mx_inv)
# cut off any singular values that could cause control problems
singularity_thresh = .00025
for i in range(len(svd_s)):
    svd_s[i] = 0 if svd_s[i] < singularity_thresh else \
        1./float(svd_s[i])
# numpy returns U,S,V.T, so have to transpose both here
Mx = np.dot(svd_v.T, np.dot(np.diag(svd_s), svd_u.T))

kp = 100
kv = np.sqrt(kp)
# calculate desired force in (x,y,z) space
u_xyz = np.dot(Mx, target_xyz - xyz)
# transform into joint space, add vel and gravity compensation
u = (kp * np.dot(JEE.T, u_xyz) - np.dot(Mq, kv * dq) - Mq_g)

# calculate our secondary control signal
# calculated desired joint angle acceleration
q_des = (((robot_config.rest_angles - q) + np.pi) %
         (np.pi*2) - np.pi)
u_null = np.dot(Mq, (kp * q_des - kv * dq))

# calculate the null space filter
Jdyn_inv = np.dot(Mx, np.dot(JEE, np.linalg.inv(Mq)))
null_filter = (np.eye(robot_config.num_joints) -
               np.dot(JEE.T, Jdyn_inv))
u_null_filtered = np.dot(null_filter, u_null)

u += u_null_filtered

And there you go!

You can see all of this code up on my GitHub, along a full example controlling a UR5 VREP model though reaching to a series of targets. It looks something pretty much like this (slightly better because this gif was made before adding in the null space controller):

ur5.gif

Overhead of using lambdify instead of hard-coding

This was a big question that I had, because when I’m running simulations the time step is on the order of a few milliseconds, with the controller code called at every time step. So I reaaaally can’t afford a crazy overhead for using lambdify.

To test this, I used the handy Python timeit, which requires a bit awkward setup, but quite nicely calls the function a whole bunch of times (1,000,000 by default) and accounts for various other things going on that could affect the execution time.

I tested two sample functions, one simpler than the other. Here’s the code for setting up and testing the first function:

import timeit
import seaborn

# Test 1 ----------------------------------------------------------------------
print('\nTest function 1: ')
time_sympy1 = timeit.timeit(
        stmt = 'f(np.random.random(), np.random.random())',
        setup = 'import numpy as np;\
                import sympy as sp;\
                q0 = sp.Symbol("q0");\
                l0 = sp.Symbol("l0");\
                a = sp.cos(q0) * l0;\
                f = sp.lambdify((q0, l0), a, "numpy")')
print('Sympy lambdify function 1 time: ', time_sympy1)

time_hardcoded1 = timeit.timeit(
        stmt = 'np.cos(np.random.random())*np.random.random()',
        setup = 'import numpy as np')
print('Hard coded function 1 time: ', time_hardcoded1)

Pretty simple, a bit of a pain in the sympy setup, but other than that not bad at all. The second function I tested was just a random collection of cos and sin calls that resemble what gets computed in a Jacobian:

l1*np.sin(q0 - l0*np.sin(q1)*np.cos(q2) - l2*np.sin(q2) - l0*np.sin(q1) + q0*l0)*np.cos(q0) + l2*np.sin(q0)

And here’s the results:

simtime

So it’s slower for sure, but again this is the difference in time after 1,000,000 function calls, so until some big optimization needs to be done using the SymPy lambdify function is definitely worth the slight gain in execution time for the insane convenience.

The full code for the timing tests here are also up on my GitHub.

Tagged , , , ,

Dynamic movement primitives part 4: Avoiding obstacles – update

Edit: Previously I posted this blog post on incorporating obstacle avoidance, but after a recent comment on the code I started going through it again and found a whole bunch of issues. Enough so that I’ve recoded things and I’m going to repost it now with working examples and a description of the changes I made to get it going. The edited sections will be separated out with these nice horizontal lines. If you’re just looking for the example code, you can find it up on my pydmps repo, here.


Alright. Previously I’d mentioned in one of these posts that DMPs are awesome for generalization and extension, and one of the ways that they can be extended is by incorporating obstacle avoidance dynamics. Recently I wanted to implement these dynamics, and after a bit of finagling I got it working, and so that’s going to be the subject of this post.

There are a few papers that talk about this, but the one we’re going to use is Biologically-inspired dynamical systems for movement generation: automatic real-time goal adaptation and obstacle avoidance by Hoffmann and others from Stefan Schaal’s lab. This is actually the second paper talking about obstacle avoidance and DMPs, and this is a good chance to stress one of the most important rules of implementing algorithms discussed in papers: collect at least 2-3 papers detailing the algorithm (if possible) before attempting to implement it. There are several reasons for this, the first and most important is that there are likely some typos in the equations of one paper, by comparing across a few papers it’s easier to identify trickier parts, after which thinking through what the correct form should be is usually straightforward. Secondly, often equations are updated with simplified notation or dynamics in later papers, and you can save yourself a lot of headaches in trying to understand them just by reading a later iteration. I recklessly disregarded this advice and started implementation using a single, earlier paper which had a few key typos in the equations and spent a lot of time tracking down the problem. This is just a peril inherent in any paper that doesn’t provide tested code, which is almost all, sadface.

OK, now on to the dynamics. Fortunately, I can just reference the previous posts on DMPs here and don’t have to spend any time discussing how we arrive at the DMP dynamics (for a 2D system):

\ddot{\textbf{y}} = \alpha_y (\beta_y( \textbf{g} - \textbf{y}) - \dot{\textbf{y}}) + \textbf{f},

where \alpha_y and \beta_y are gain terms, \textbf{g} is the goal state, \textbf{y} is the system state, \dot{\textbf{y}} is the system velocity, and \textbf{f} is the forcing function.
As mentioned, DMPs are awesome because now to add obstacle avoidance all we have to do is add another term

\ddot{\textbf{y}} = \alpha_y (\beta_y( \textbf{g} - \textbf{y}) - \dot{\textbf{y}}) + \textbf{f} + \textbf{p}(\textbf{y}, \dot{\textbf{y}}),

where \textbf{p}(\textbf{y}, \dot{\textbf{y}}) implements the obstacle avoidance dynamics, and is a function of the DMP state and velocity. Now then, the question is what are these dynamics exactly?

Obstacle avoidance dynamics

It turns out that there is a paper by Fajen and Warren that details an algorithm that mimics human obstacle avoidance. The idea is that you calculate the angle between your current velocity and the direction to the obstacle, and then turn away from the obstacle. The angle between current velocity and direction to the obstacle is referred to as the steering angle, denoted \varphi, here’s a picture of it:

psi
So, given some \varphi value, we want to specify how much to change our steering direction, \dot{\varphi}, as in the figure below:
dpsi
If we’re on track to hit the object (i.e. \varphi is near 0) then we steer away hard, and then make your change in direction less and less as the angle between your heading (velocity) and the object is larger and larger. Formally, define \dot{\varphi} as

\dot{\varphi} = \gamma \;\varphi \;\textrm{exp}(-\beta | \varphi |),

where \gamma and \beta are constants, which are specified as 1000 and \frac{20}{\pi} in the paper, respectively.


Edit: OK this all sounds great, but quickly you run into issues here. The first is how do we calculate \varphi? In the paper I was going off of they used a dot product between the \dot{\textbf{y}} vector and the \textbf{o} - \textbf{y} vector to get the cosine of the angle, and then use np.arccos to get the angle from that. There is a big problem with this here, however, that’s kind of subtle: You will never get a negative angle when you calculate this, which, of course. That’s not how np.arccos works, but it is still what we need so we will be able to tell if we should be turning left or right to avoid this object!

So we need to use a different way of calculating the angle, one that tells us if the object is in a + or – angle relative to the way we’re headed. To calculate an angle that will give us + or -, we’re going to use the np.arctan2 function.

We want to center things around the way we’re headed, so first we calculate the angle of the direction vector, \dot{\textbf{y}}. Then we create a rotation vector, R_dy to rotate the vector describing the direction of the object. This shifts things around so that if we’re moving straight towards the object it’s at 0 degrees, if we’re headed directly away from it, it’s at 180 degrees, etc. Once we have that vector, nooowwww we can find the angle of the direction of the object and that’s what we’re going to use for phi. Huzzah!

# get the angle we're heading in
phi_dy = -np.arctan2(dy[1], dy[0]) 
R_dy = np.array([[np.cos(phi_dy), -np.sin(phi_dy)],
                 [np.sin(phi_dy), np.cos(phi_dy)]])
# calculate vector to object relative to body
obj_vec = obstacle - y
# rotate it by the direction we're going 
obj_vec = np.dot(R_dy, obj_vec)
# calculate the angle of obj relative to the direction we're going
phi = np.arctan2(obj_vec[1], obj_vec[0])

This \dot{\varphi} term can be thought of as a weighting, telling us how much we need to rotate based on how close we are to running into the object. To calculate how we should rotate we’re going to calculate the angle orthonormal to our current velocity, then weight it by the distance between the object and our current state on each axis. Formally, this is written:

\textbf{R} \; \dot{\textbf{y}},

where \textbf{R} is the axis (\textbf{o} - \textbf{y}) \times \dot{\textbf{y}} rotated 90 degrees (the \times denoting outer product here). The way I’ve been thinking about this is basically taking your velocity vector, \dot{\textbf{y}}, and rotating it 90 degrees. Then we use this rotated vector as a row vector, and weight the top row by the distance between the object and the system along the x axis, and the bottom row by the difference along the \textbf{y} axis. So in the end we’re calculating the angle that rotates our velocity vector 90 degrees, weighted by distance to the object along each axis.

So that whole thing takes into account absolute distance to object along each axis, but that’s not quite enough. We also need to throw in \dot{\varphi}, which looks at the current angle. What this does is basically look at ‘hey are we going to hit this object?’, if you are on course then make a big turn and if you’re not then turn less or not at all. Phew.

OK so all in all this whole term is written out

\textbf{p}(\textbf{y}, \dot{\textbf{y}}) = \textbf{R} \; \dot{\textbf{y}} \; \dot{\varphi},

and that’s what we add in to the system acceleration. And now our DMP can avoid obstacles! How cool is that?

Super compact, straight-forward to add, here’s the code:


Edit: OK, so not suuuper compact. I’ve also added in another couple checks. The big one is “Is this obstacle between us and the target or not?”. So I calculate the Euclidean distance to the goal and the obstacle, and if the obstacle is further away then the goal, set the avoidance signal to zero (performed at the end of the if)! This took care of a few weird errors where you would get a big deviation in the trajectory if it saw an obstacle past the goal, because it was planning on avoiding it, but then was pulled back in to the goal before the obstacle anyways so it was a pointless exercise. The other check added in is just to make sure you only add in obstacle avoidance if the system is actually moving. Finally, I also changed the \gamma = 100 instead of 1000.


def avoid_obstacles(y, dy, goal):
    p = np.zeros(2)
 
    for obstacle in obstacles:
        # based on (Hoffmann, 2009)

        # if we're moving
        if np.linalg.norm(dy) > 1e-5:

            # get the angle we're heading in
            phi_dy = -np.arctan2(dy[1], dy[0]) 
            R_dy = np.array([[np.cos(phi_dy), -np.sin(phi_dy)],
                             [np.sin(phi_dy), np.cos(phi_dy)]])
            # calculate vector to object relative to body
            obj_vec = obstacle - y
            # rotate it by the direction we're going 
            obj_vec = np.dot(R_dy, obj_vec)
            # calculate the angle of obj relative to the direction we're going
            phi = np.arctan2(obj_vec[1], obj_vec[0])

            dphi = gamma * phi * np.exp(-beta * abs(phi))
            R = np.dot(R_halfpi, np.outer(obstacle - y, dy))
            pval = -np.nan_to_num(np.dot(R, dy) * dphi)

            # check to see if the distance to the obstacle is further than 
            # the distance to the target, if it is, ignore the obstacle
            if np.linalg.norm(obj_vec) > np.linalg.norm(goal - y):
                pval = 0

            p += pval
    return p

And that’s it! Just add this method in to your DMP system and call avoid_obstacles at every timestep, and add it in to your DMP acceleration.

You hopefully noticed in the code that this is set up for multiple obstacles, and that all that that entailed was simply adding the p value generated by each individual obstacle. It’s super easy! Here’s a very basic graphic showing how the DMP system can avoid obstacles:
obj_avoid
So here there’s just a basic attractor system (DMP without a forcing function) trying to move from the center position to 8 targets around the unit circle (which are highlighted in red), and there are 4 obstacles that I’ve thrown onto the field (black x’s). As you can see, the system successfully steers way clear of the obstacles while moving towards the target!

We must all use this power wisely.


Edit: Making the power yours is now easier than ever! You can check out this code at my pydmps GitHub repo. Clone the repo and after you python setup.py develop, change directories into the examples folder and run the avoid_obstacles.py file. It will randomly generate 5 targets in the environment and perform 20 movements, giving you something looking like this:

obj_avoid2


Tagged , , , ,

Using VREP for simulation of force-controlled models

I’ve been playing around a bit with different simulators, and one that we’re a big fan of in the lab is VREP. It’s free for academics and you can talk to them about licences if you’re looking for commercial use. I haven’t actually had much experience with it before myself, so I decided to make a simple force controlled arm model to get experience using it. All in all, there were only a few weird API things that I had to get through, and once you have them down it’s pretty straight forward. This post is going to be about the steps that I needed to take to get things all set up. For a more general start-up on VREP check out All the code in this post and the model I use can be found up on my GitHub.

Getting the right files where you need them

As discussed in the remote API overview, you’ll need three files in whatever folder you want to run your Python script from to be able to hook into VREP remotely:

  • remoteApi.dll, remoteApi.dylib or remoteApi.so (depending on what OS you’re using)
  • vrep.py
  • vrepConstants.py

You can find these files inside your VREP_HOME/programming/remoteApiBindings/python/python and VREP_HOME/programming/remoteApiBindings/lib/lib folders. Make sure that these files are in whatever folder you’re running your Python scripts from.

The model

It’s easy to create a new model to mess around with in VREP, so that’s the route I went, rather than importing one of their pre-made models and having some sneaky parameter setting cause me a bunch of grief. You can just right click->add then go at it. There are a bunch of tutorials so I’m not going to go into detail here. The main things are:

  • Make sure joints are in ‘Torque/force’ mode.
  • Make sure that joint ‘Motor enabled’ property is checked. The motor enabled property is in the ‘Show dynamic properties dialogue’ menu, which you find when you double click on the joint in the Scene Hierarchy.
  • Know the names of the joints as shown in the Scene Hierarchy.

So here’s a picture:

VREP
where you can see the names of the objects in the model highlighted in red, the Torque/force selection highlighted in blue, and the Motor enabled option highlighted in green. And of course my beautiful arm model in the background.

Setting up the continuous server

The goal is to connect VREP to Python so that we can send torques to the arm from our Python script and get the feedback necessary to calculate those torques. There are a few ways to set up a remote connection in VREP.

The basic one is they have you add a child script in VREP and attach it to an object in the world that sets up a port and then you hit go on your simulation and can then run your Python script to connect in. This gets old real fast. Fortunately, it’s easy to automate everything from Python so that you can connect in, start the simulation, run it for however long, and then stop the simulation without having to click back and forth.

The first step is to make sure that your remoteApiConnections.txt file in you VREP home directory is set up properly. A continuous server should be set up by default, but doesn’t hurt to double check. And you can take the chance to turn on debugging, which can be pretty helpful. So open up that file and make sure it looks like this:

portIndex1_port                 = 19997
portIndex1_debug                = true
portIndex1_syncSimTrigger       = true

Once that’s set up, when VREP starts we can connect in from Python. In our Python script, first we’ll close any open connections that might be around, and then we’ll set up a new connection in:

import vrep 

# close any open connections
vrep.simxFinish(-1)
# Connect to the V-REP continuous server
clientID = vrep.simxStart('127.0.0.1', 19997, True, True, 500, 5) 

if clientID != -1: # if we connected successfully
    print ('Connected to remote API server')

So once the connection is made, we check the clientID value to make sure that it didn’t fail, and then we carry on with our script.

Synchronizing

By default, VREP will run its simulation in its own thread, and both the simulation and the controller using the remote API will be running simultaneously. This can lead to some weird behaviour as things fall out of synch etc etc, so what we want instead is for the VREP sim to only run one time step for each time step the controller runs. To do that, we need to set VREP to synchronous mode. So the next few lines of our Python script look like:

    # --------------------- Setup the simulation 

    vrep.simxSynchronous(clientID,True)

and then later, once we’ve calculated our control signal, sent it over, and want the simulation to run one time step forward, we do that by calling

    # move simulation ahead one time step
    vrep.simxSynchronousTrigger(clientID)

Get the handles to objects of interest

OK the next chunk of code in our script uses the names of our objects (as specified in the Scene Hierarchy in VREP) to get an ID for each which to identify which object we want to send a command to or receive feedback from:

    joint_names = ['shoulder', 'elbow']
    # joint target velocities discussed below
    joint_target_velocities = np.ones(len(joint_names)) * 10000.0

    # get the handles for each joint and set up streaming
    joint_handles = [vrep.simxGetObjectHandle(clientID,
        name, vrep.simx_opmode_blocking)[1] for name in joint_names]

    # get handle for target and set up streaming
    _, target_handle = vrep.simxGetObjectHandle(clientID,
                    'target', vrep.simx_opmode_blocking)

Set dt and start the simulation

And the final thing that we’re going to do in our simulation set up is specify the timestep that we want to use, and then start the simulation. I found this in a forum post, and I must say whatever VREP lacks in intuitive API their forum moderators are on the ball. NOTE: To use a custom time step you have to also set the dt option in the VREP simulator to ‘custom’. The drop down is to the left of the ‘play’ arrow, and if you don’t have it set to ‘custom’ you won’t be able to set the time step through Python.

    dt = .01
    vrep.simxSetFloatingParameter(clientID,
            vrep.sim_floatparam_simulation_time_step,
            dt, # specify a simulation time step
            vrep.simx_opmode_oneshot)

    # --------------------- Start the simulation

    # start our simulation in lockstep with our code
    vrep.simxStartSimulation(clientID,
            vrep.simx_opmode_blocking)

Main loop

For this next chunk I’m going to cut out everything that’s not VREP, since I have a bunch of posts explaining the control signal derivation and forward transformation matrices.

So, once we’ve started the simulation, I’ve set things up for the arm to be controlled for 1 second and then for the simulation to stop and everything shut down and disconnect.

    while count < 1: # run for 1 simulated second

        # get the (x,y,z) position of the target
        _, target_xyz = vrep.simxGetObjectPosition(clientID,
                target_handle,
                -1, # retrieve absolute, not relative, position
                vrep.simx_opmode_blocking)
        if _ !=0 : raise Exception()
        track_target.append(np.copy(target_xyz)) # store for plotting
        target_xyz = np.asarray(target_xyz)

        q = np.zeros(len(joint_handles))
        dq = np.zeros(len(joint_handles))
        for ii,joint_handle in enumerate(joint_handles):
            # get the joint angles
            _, q[ii] = vrep.simxGetJointPosition(clientID,
                    joint_handle,
                    vrep.simx_opmode_blocking)
            if _ !=0 : raise Exception()
            # get the joint velocity
            _, dq[ii] = vrep.simxGetObjectFloatParameter(clientID,
                    joint_handle,
                    2012, # ID for angular velocity of the joint
                    vrep.simx_opmode_blocking)
            if _ !=0 : raise Exception()

In the above chunk of code, I think the big thing to point out is that I’m using vrep.simx_opmode_blocking in each call, instead of vrep.simx_opmode_buffer. The difference is that you for sure get the current values of the simulation when you use blocking, and you can be behind a time step using buffer.

Aside from that the other notable things are I raise an exception if the first parameter (which is the return code) is ever not 0, and that I use simxGetObjectFloatParameter to get the joint velocity instead of simxGetObjectVelocity, which has a rotational velocity component. Zero is the return code for ‘everything worked’, and if you don’t check it and have some silly things going on you can be veeerrrrryyy mystified when things don’t work. And what simxGetObjectVelocity returns is the rotational velocity of the joint relative to the world frame, and not the angular velocity of the joint in its own coordinates. That was also a briefly confusing.

So the next thing I do is calculate u, which we’ll skip, and then we need to set the forces for the joint. This part of the API is real screwy. You can’t set the force applied to the joint directly. Instead, you have to set the target velocity of the joint to some really high value (hence that array we made before), and then modulate the maximum force that can be applied to that joint. Also important: When you want to apply a force in the other direction, you change the sign on the target velocity, but keep the force sign positive.

        # ... calculate u ... 

        for ii,joint_handle in enumerate(joint_handles):
            # get the current joint torque
            _, torque = \
                vrep.simxGetJointForce(clientID,
                        joint_handle,
                        vrep.simx_opmode_blocking)
            if _ !=0 : raise Exception()

            # if force has changed signs,
            # we need to change the target velocity sign
            if np.sign(torque) * np.sign(u[ii]) < 0:
                joint_target_velocities[ii] = \
                        joint_target_velocities[ii] * -1
                vrep.simxSetJointTargetVelocity(clientID,
                        joint_handle,
                        joint_target_velocities[ii], # target velocity
                        vrep.simx_opmode_blocking)
            if _ !=0 : raise Exception()

            # and now modulate the force
            vrep.simxSetJointForce(clientID,
                    joint_handle,
                    abs(u[ii]), # force to apply
                    vrep.simx_opmode_blocking)
            if _ !=0 : raise Exception()

        # move simulation ahead one time step
        vrep.simxSynchronousTrigger(clientID)
        count += dt

So as you can see we check the current torque, see if we need to change the sign on the target velocity, modulate the maximum allowed force, and then finally step the VREP simulation forward.

Conclusions

And there you go! Here’s an animation of it in action (note this is a super low quality gif and it looks way better / smoother when actually running it yourself):

reach_to_target

All in all, VREP has been enjoyable to work with so far. It didn’t take long to get things moving and off the ground, the visualization is great, and I haven’t even scratched the surface of what you can do with it. Best of all (so far) you can fully automate everything from Python. Hopefully this is enough to help some people get their own models going and save a few hours and headaches! Again, the full code and the model are up on my GitHub.

Nits

  • When you’re applying your control signal, make sure you test each joint in isolation, to make sure your torques push things in the direction you think they do. I had checked the rotation direction in VREP, but the control signal for both joints ended up needing to be multiplied by -1.
  • Another nit when you’re building your model, if you use the rotate button from the VREP toolbar on your model, wherever that joint rotates to is now 0 degrees. If you want to set the joint to start at 45 degrees, instead double click and change Pos[deg] option inside ‘Joint’ in Scene Object Properties.
Tagged , , , ,

Deep learning for control using augmented Hessian-free optimization

Traditionally, deep learning is applied to feed-forward tasks, like classification, where the output of the network doesn’t affect the input to the network. It is a decidedly harder problem when the output is recurrently connected such that network output affects the input. Because of this application of deep learning methods to control was largely unexplored until a few years ago. Recently, however, there’s been a lot of progress and research in this area. In this post I’m going to talk about an implementation of deep learning for control presented by Dr. Ilya Sutskever in his thesis Training Recurrent Neural Networks.

In his thesis, Dr. Sutskever uses augmented Hessian-free (AHF) optimization for learning. There are a bunch of papers and posts that go into details about AHF, here’s a good one by Andrew Gibiansky up on his blog, that I recommend you check out. I’m not going to really talk much here about what AHF is specifically, or how it differs from other methods, if you’re unfamiliar there are lots of places you can read up on it. Quickly, though, AHF is kind of a bag of tricks you can use with a fast method for estimating the curvature of the loss function with respect to the weights of a neural network, as well as the gradient, which allows it to make larger updates in directions where the loss function doesn’t change quickly. So rather than estimating the gradient and then doing a small update along each dimension, you can make the size of your update large in directions that change slowly and small along dimensions where things change quickly. And now that’s enough about that.

In this post I’m going to walk through using a Hessian-free optimization library (version 0.3.8) written by my compadre Dr. Daniel Rasmussen to train up a neural network to train up a 2-link arm, and talk about the various hellish gauntlets you need run to get something that works. Whooo! The first thing to do is install this Hessian-free library, linked above.

I’ll be working through code edited a bit for readability, to find the code in full you can check out the files up on my GitHub.

Build the network

Dr. Sutskever specified the structure of the network in his thesis to be 4 layers: 1) a linear input layer, 2) 100 Tanh nodes, 3) 100 Tanh nodes, 4) linear output layer. The network is connected up with the standard feedforward connections from 1 to 2 to 3 to 4, plus recurrent connections on 2 and 3 to themselves, plus a ‘skip’ connection from layer 1 to layer 3. Finally, the input to the network is the target state for the plant and the current state of the plant. So, lots of recursion! Here’s a picture:

network structure

The output layer connects in to the plant, and, for those unfamiliar with control theory terminology, ‘plant’ just means the system that you’re controlling. In this case an arm simulation.

Before we can go ahead and set up the network that we want to train, we also need to specify the loss function that we’re going to be using during training. The loss function in Ilya’s thesis is a standard one:

L(\theta) = \sum\limits^{N-1}\limits_{t=0} \ell(\textbf{u}_t) + \ell_f(\textbf{x}_N),
\ell(\textbf{u}_t) = \alpha \frac{||\textbf{u}_t||^2}{2},
\ell_f(\textbf{x}_N) = \frac{||\textbf{x}^* - \textbf{x}_t||^2}{2}

where L(\theta) is the total cost of the trajectory generated with \theta, the set of network parameters, \ell(\textbf{u}) is the immediate state cost, \ell_f(\textbf{x}) is the final state cost, \textbf{x} is the state of the arm, \textbf{x}^* is the target state of the arm, \textbf{u} is the control signal (torques) that drives the arm, and \alpha is a gain value.

To code this up using the hessianfree library we do:

from hessianfree import RNNet
from hessianfree.nonlinearities import (Tanh, Linear, Plant)
from hessianfree.loss_funcs import SquaredError, SparseL2

l2gain = 10e-3 * dt # gain on control signal loss
rnn = RNNet(
    # specify the number of nodes in each layer
    shape=[num_states * 2, 96, 96, num_states, num_states],
    # specify the function of the nodes in each layer
    layers=[Linear(), Tanh(), Tanh(), Linear(), plant],
    # specify the layers that have recurrent connections
    rec_layers=[1,2],
    # specify the connections between layers
    conns={0:[1, 2], 1:[2], 2:[3], 3:[4]},
    # specify the loss function
    loss_type=[
        # squared error between plant output and targets
        SquaredError(),
        # penalize magnitude of control signal (output of layer 3)
        SparseL2(l2gain, layers=[3])],
    load_weights=W,
    use_GPU=True)

Note that if you want to run it on your GPU you’ll need PyCuda and sklearn installed. And a GPU.

An important thing to note as well is that in Dr. Sustkever’s thesis when we’re calculating the squared error of the distance from the arm state to the target, this is measured in joint angles. So it’s kind of a weird set up to be looking at the movement of the hand but have your cost function in joint-space instead of end-effector space, but it definitely simplifies training by making the cost more directly relatable to the control signal. So we need to calculate the joint angles of the arm that will have the hand at different targets around a circle. To do this we’ll take advantage of our inverse kinematics solver from way back when, and use the following code:

def gen_targets(arm, n_targets=8, sig_len=100):
    #Generate target angles corresponding to target
    #(x,y) coordinates around a circle
    import scipy.optimize

    x_bias = 0
    if arm.DOF == 2:
        y_bias = .35
        dist = .075
    elif arm.DOF == 3:
        y_bias = .5
        dist = .2

    # set up the reaching trajectories around circle
    targets_x = [dist * np.cos(theta) + x_bias \
                    for theta in np.linspace(0, np.pi*2, 65)][:-1]
    targets_y = [dist * np.sin(theta) + y_bias \
                    for theta in np.linspace(0, np.pi*2, 65)][:-1]

    joint_targets = []
    for ii in range(len(targets_x)):
        joint_targets.append(arm.inv_kinematics(xy=(targets_x[ii],
                                                    targets_y[ii])))
    targs = np.asarray(joint_targets)

    # repeat the targets over time
    for ii in range(targs.shape[1]-1):
        targets = np.concatenate(
            (np.outer(targs[:, ii], np.ones(sig_len))[:, :, None],
             np.outer(targs[:, ii+1], np.ones(sig_len))[:, :, None]), axis=-1)
    targets = np.concatenate((targets, np.zeros(targets.shape)), axis=-1)
    # only want to penalize the system for not being at the
    # target at the final state, set everything before to np.nan
    targets[:, :-1] = np.nan 

    return targets

And you can see in the last couple lines that to implement the distance to target as a final state cost penalty only we just set all of the targets before the final time step equal to np.nan. If we wanted to penalize distance to target throughout the whole trajectory we would just comment that line out.

Create the plant

You’ll notice in the code that defines our RNN I set the last layer of the network to be plant, but that that’s not defined anywhere. Let’s talk. There are a couple of things that we’re going to need to incorporate our plant into this network and be able to use any deep learning method to train it. We need to be able to:

  1. Simulate the plant forward; i.e. pass in input and get back the resulting plant state at the next timestep.
  2. Calculate the derivative of the plant state with respect to the input; i.e. how do small changes in the input affect the state.
  3. Calculate the derivative of the plant state with respect to the previous state; i.e. how do small changes in the plant state affect the state at the next timestep.
  4. Calculate the derivative of the plant output with respect to its state; i.e. how do small changes in the current position of the state affect the output of the plant.

So 1 is easy, we have the arm simulations that we want already, they’re up on my GitHub. Number 4 is actually trivial too, because the output of our plant is going to be the state itself, so the derivative of the output with respect to the state is just the identity matrix.

For 2 and 3, we’re going to need to calculate some derivatives. If you’ve read the last few posts you’ll note that I’m on a finite differences kick. So let’s get that going! Because no one wants to calculate derivatives!

Important note, the notation in these next couple pieces of code is going to be a bit different from my normal notation because they’re matching with the hessianfree library notation, which is coming from a reinforcement learning literature background instead of a control theory background. So, s is the state of the plant, and x is the input to the plant. I know, I know. All the same, make sure to keep that in mind.

# calculate ds0/dx0 with finite differences
d_input_FD = np.zeros((x.shape[0], # number of trials
                       x.shape[1], # number of inputs
                       self.state.shape[1])) # number of states
for ii in range(x.shape[1]):
    # calculate state adding eps to x[ii]
    self.reset_plant(self.prev_state)
    inc_x = x.copy()
    inc_x[:, ii] += self.eps
    self.activation(inc_x)
    state_inc = self.state.copy()
    # calculate state subtracting eps from x[ii]
    self.reset_plant(self.prev_state)
    dec_x = x.copy()
    dec_x[:, ii] -= self.eps
    self.activation(dec_x)
    state_dec = self.state.copy()

    d_input_FD[:, :, ii] = \
        (state_inc - state_dec) / (2 * self.eps)
d_input_FD = d_input_FD[..., None]

Alrighty. First we create a tensor to store the results. Why is it a tensor? Because we’re going to be doing a bunch of runs at once. So our state dimensions are actually trials x size_input. When we then take the partial derivative, we end up with trials many size_input x size_state matrices. Then we increase each of the parameters of the input slightly one at a time and store the results, decrease them all one at a time and store the results, and compute our approximation of the gradient.

Next we’ll do the same for calculating the derivative of the state with respect to the previous state.

# calculate ds1/ds0
d_state_FD = np.zeros((x.shape[0], # number of trials
                       self.state.shape[1], # number of states
                       self.state.shape[1])) # number of states
for ii in range(self.state.shape[1]):
    # calculate state adding eps to self.state[ii]
    state = np.copy(self.prev_state)
    state[:, ii] += self.eps
    self.reset_plant(state)
    self.activation(x)
    state_inc = self.state.copy()
    # calculate state subtracting eps from self.state[ii]
    state = np.copy(self.prev_state)
    state[:, ii] -= self.eps
    self.reset_plant(state)
    self.activation(x)
    state_dec = self.state.copy()

    d_state_FD[:, :, ii] = \
        (state_inc - state_dec) / (2 * self.eps)
d_state_FD = d_state_FD[..., None]

Great! We’re getting closer to having everything we need. Another thing we need is a wrapper for running our arm simulation. It’s going to look like this:

def activation(self, x):
    state = []
    # iterate through and simulate the plant forward
    # for each trial
    for ii in range(x.shape[0]):
        self.arm.reset(q=self.state[ii, :self.arm.DOF],
                       dq=self.state[ii, self.arm.DOF:])
        self.arm.apply_torque(u[ii])
        state.append(np.hstack([self.arm.q, self.arm.dq]))
    state = np.asarray(state)

    self.state = self.squashing(state)

This is definitely not the fastest code to run. Much more ideally we would put the state and input into vectors and do a single set of computations for each call to activation rather than having that for loop in there. Unfortunately, though, we’re not assuming that we have access to the dynamics equations / will be able to pass in vector states and inputs.

Squashing
Looking at the above code that seems pretty clear what’s going on, except you might notice that last line calling self.squashing. What’s going on there?

The squashing function looks like this:

def squashing(self, x):
    index_below = np.where(x < -2*np.pi)
    x[index_below] = np.tanh(x[index_below]+2*np.pi) - 2*np.pi
    index_above = np.where(x > 2*np.pi)
    x[index_above] = np.tanh(x[index_above]-2*np.pi) + 2*np.pi
    return x

All that’s happening here is that we’re taking our input, and doing nothing to it as long as it doesn’t start to get too positive or too negative. If it does then we just taper it off and prevent it from going off to infinity. So running a 1D vector through this function we get:

squashed
This ends up being a pretty important piece of the code here. Basically it prevents wild changes to the weights during learning to result in the system breaking down. So the state of the plant can’t go off to infinity and cause an error to be thrown, stopping our simulation. But because the target state is well within the bounds of where the squashing function does nothing, post-training we’ll still be able to use the resulting network to control a system that doesn’t have this fail safe built in. Think of this function as training wheels that catch you only if you start falling.

With that, we no have pretty much all of the parts necessary to begin training our network!

Training the network

We’re going to be training this network on the centre-out reaching task, where you start at a centre point and reach to a bunch of target locations around a circle. I’m just going to be re-implementing the task as it was done in Dr. Sutskever’s thesis, so we’ll have 64 targets around the circle, and train using a 2-link arm. Here’s the code that we’ll use to actually run the training:

for ii in range(last_trial+1, num_batches):
    # train a bunch of batches using the same input every time
    # to allow the network a chance to minimize things with
    # stable input (speeds up training)
    err = rnn.run_batches(plant, targets=None,
              max_epochs=batch_size,
              optimizer=HessianFree(CG_iter=96, init_damping=100))

    # save the weights to file, track trial and error
    # err = rnn.error(inputs)
    err = rnn.best_error
    name = 'weights/rnn_weights-trial%04i-err%.3f'%(ii, err)
    np.savez_compressed(name, rnn.W)

Training your own network

A quick aside: if you want to run this code yourself, get a real good computer, have an arm simulation ready, the hessianfree Python library installed, and download and run this train_hf.py file. (Note: I used version 0.3.8 of the hessianfree library, which you can install using pip install hessianfree==0.3.8) This will start training and save the weights into a weights/ folder, so make sure that that exists in the same folder as train_hf.py. If you want to view the results of the training at any point run the plot_error.py file, which will load in the most recent version of the weights and plot the error so far. If you want to generate an animated plot like I have below run gen_animation_plots.py and then the last command from my post on generating animated gifs.

Another means of seeing the results of your trained up network is to use the controller I’ve implemented in my controls benchmarking suite, which looks for a set of saved weights in the controllers/weights folder, and will load it in and use it to generate command signals for the arm by running it with

python run.py arm2_python ahf reach --dt=1e-2

where you replace arm2_python with whatever arm model you trained your model on. Note the --dt=1e-2 flag, that is important because the model was trained with a .01 timestep and things get a bit weird if you suddenly change the dynamics on the controller.

OK let’s look at some results!

Results

OK full discretion, these results are not optimizing the cost function we discussed above. They’re implementing a simpler cost function that only looks at the final state, i.e. it doesn’t penalize the magnitude of the control signal. I did this because Dr. Sutskever says in his thesis he was able to optimize with just the final state cost using much smaller networks. I originally looked at neurons with 96 neurons in each layer, and it just took forgoddamnedever to run. So after running for 4 weeks (not joking) and needing to make some more changes I dropped the number of neurons and simplified the task.

The results below are from running a network with 32 neurons in each layer controlling this 2-link arm, and took another 4-5 weeks to train up.
weights32

Hey that looks good! Not bad, augmented Hessian-free learning, not bad. It had pretty consistent (if slow) decline in the error rate, with a few crazy bumps from which it quickly recovered. Also take note that each training iteration is actually 32 runs, so it’s not 12,50-ish runs it’s closer to 400,000 training runs that it took to get here.

One biggish thing that was a pain was that it turns out that I only trained the neural network for reaching in the one direction, and when you only train it to reach one way it doesn’t generalize to reaching back to the starting point (which, fair enough). But, I didn’t realize this until I was took the trained network and ran it in the benchmarking code, at which point I was not keen to redo all of the training it took to get the neural network to the level of accuracy it was at under a more complicated training set. The downside of this is that even though I’ve implemented a controller that takes in the trained network and uses it to control the arm, to do the reaching task I have to just do a hard reset after the arm reaches the target, because it can’t reach back to the center, like all the other controllers. All the same, here’s an animation of the trained up AHF controller reaching to 8 targets (it was trained on all 64 above though):

animation

Things don’t always go so smoothly, though. Here’s results from another training run that took around 2-3 weeks, and uses a different 2-link arm model (translated from Matlab code written by Dr. Emo Todorov):

weights32_suts2

What I found frustrating about this was that if you look at the error over time then this arm is doing as well or better than the previous arm at a lot of points. But the corresponding trajectories look terrible, like something you would see in a horror movie based around getting good machine learning results. This of course comes down to how I specified the cost function, and when I looked at the trajectories plotted over time the velocity of the arm is right at zero at the final time step, which it is not quiiiitte the case for the first controller. So this second network has found a workaround to minimize the cost function I specified in a way I did not intend. To prevent this, doing something like weighting the distance to target heavier than non-zero velocity would probably work. Or possibly just rerunning the training with a different random starting point you could get out a better controller, I don’t have a great feel for how important the random initialization is, but I’m hoping that it’s not all too important and its effects go to zero with enough training. Also, it should be noted I’ve run the first network for 12,500 iterations and the second for less than 6,000, so I’ll keep letting them run and maybe it will come around. The first one looked pretty messy too until about 4,000 iterations in.

Training regimes

Frustratingly, the way that you train deep networks is very important. So, very much like the naive deep learning network trainer that I am, I tried the first thing that pretty much anyone would try:

  • run the network,
  • update the weights,
  • repeat.

This is what I’ve done in the results above. And it worked well enough in that case.

If you remember back to the iLQR I made a little while ago, I was able to change the cost function to be

L(\theta) = \sum\limits^{N-1}\limits_{t=0} \ell(\textbf{u}_t) + \ell_f(\textbf{x}_N),
\ell(\textbf{u}_t, \textbf{x}_t) = \alpha \frac{||\textbf{u}_t||^2}{2} + \frac{||\textbf{x}^* - \textbf{x}_t||^2}{2},
\ell_f(\textbf{x}_N) = \frac{||\textbf{x}^* - \textbf{x}_t||^2}{2}

(i.e. to include a penalty for distance to target throughout the trajectory and not just at the final time step) which resulted in straighter trajectories when controlling the 2-link arm. So I thought I would try this here as well. Sadly (incredibly sadly), this was fairly fruitless. The network didn’t really learn or improve much at all.

After much consideration and quandary on my part, I talked with Dr. Dan and he suggested that I try another method:

  • run the network,
  • record the input,
  • hold the input constant for a few batches of weight updating,
  • repeat.

This method gave much better results. BUT WHY? I hear you ask! Good question. Let me give giving explanation a go.

Essentially, it’s because the cost function is more complex now. In the first training method, the output from the plant is fed back into the network as input at every time step. When the cost function was simpler this was OK, but now we’re getting very different input to train on at every iteration. So the system is being pulled in different directions back and forth at every iteration. In the second training regime, the same input is given several times in a row, which let’s the system follow the same gradient for a few training iterations before things change again. In my head I picture this as giving the algorithm a couple seconds to catch its breath dunking it back underwater.

This is a method that’s been used in a bunch of places recently. One of the more high-profile instances is in the results published from DeepMind on deep RL for control and for playing Go. And indeed, it also works well here.

To implement this training regime, we set up the following code:

for ii in range(last_trial+1, num_batches):

    # run the plant forward once
    rnn.forward(input=plant, params=rnn.W)

    # get the input and targets from above rollout
    inputs = plant.get_vecs()[0].astype(np.float32)
    targets = np.asarray(plant.get_vecs()[1], dtype=np.float32)

    # train a bunch of batches using the same input every time
    # to allow the network a chance to minimize things with
    # stable input (speeds up training)
    err = rnn.run_batches(inputs, targets, max_epochs=batch_size,
              optimizer=HessianFree(CG_iter=96, init_damping=100))

    # save the weights to file, track trial and error
    # err = rnn.error(inputs)
    err = rnn.best_error
    name = 'weights/rnn_weights-trial%04i-err%.3f'%(ii, err)
    np.savez_compressed(name, rnn.W)

So you can see that we do one rollout with the weights, then go in and get the inputs and targets that were used in that rollout, and start training the network while holding those constant for batch_size epochs (training sessions). From a little bit of messing around I’ve found batch_size=32 to be a pretty good number. So then it runs 32 training iterations where it’s updating the weights, and then saves those weights (because we want a loooottttt of check-points) and then restarts the loop.

Embarrassingly, I’ve lost my simulation results from this trial, somehow…so I don’t have any nice plots to back up the above, unfortunately. But since this is just a blog post I figured I would at least talk about it a little bit, since people might still find it useful if they’re just getting into the field like me. and just update this post whenever I re-run them. If I rerun them.

What I do have, however, are results where this method doesn’t work! I tried this with the simpler cost function, that only looks at the final state distance from the target, and it did not go so well. Let’s look at that one!

weights32_suts

My guess here is basically that the system has gotten to a point where it’s narrowed things down in the parameter space and now when you run 32 batches it’s overshooting. It needs feedback about its updates after every update at this point. That’s my guess, at least. So it could be the case that for more complex cost functions you’d want to train it while holding the input constant for a while, and then when the error starts to plateau switch to updating the input after every parameter update.

Conclusions

All in all, AHF for training neural networks in control is pretty awesome. There are of course still some major hold-backs, mostly related to how long it takes to train up a network, and having to guess at effective training regimes and network structures etc. But! It was able to train up a relatively small neural network to move an arm model from a center point to 64 targets around a circle, with no knowledge of the system under control at all. In Dr. Sutskever’s thesis he goes on to use the same set up under more complicated circumstances, such as when there’s a feedback delay, or a delay on the outgoing control signal, and unexpected noise etc, so it is able to learn under a number of different, fairly complex situations. Which is pretty slick.

Related to the insane training time required, I very easily could be missing some basic thing that would help speed things up. If you, reader, get ambitious and run the code on your own machine and find out useful methods for speeding up the training please let me know! Personally, my plan is to next investigate guided policy search, which seems like it’s found a way around this crazy training time.

Tagged , , , , , ,