ABR Control repo public release!

https://github.com/abr/abr_control

Last August I started working on a new version of my old control repo with a resolve to make something less hacky, as part of the work for Applied Brain Research, Inc, a startup that I co-founded with others from my lab after most of my cohort graduated. Together with Pawel Jaworski, who comprises other half of ABR’s motor team, we’ve been building up a library of useful functions for modeling, simulating, interfacing, and controlling robot arms.

Today we’re making the repository public, under the same free for non-commercial use that we’ve released our Nengo neural modelling software on. You can find it here: ABR_Control

It’s all Python 3, and here’s an overview of some of the features:

• Automates generation of functions for computing the Jacobians, joint space and task space inertia matrices, centrifugal and Coriolis effects, and Jacobian derivative, provided each link’s inertia matrix and the transformation matrices
• Option to compile these functions to speedy Cython implementations
• Operational space, joint space, floating, and sliding controllers provided with PyGame and VREP example scripts
• Interfaces with VREP
• Configuration files for one, two, and three link arms, as well as the UR5 and Jaco2 arms in VREP
• Provides Python simulations of two and three link arms, with PyGame visualization
• Path planning using first and second order filtering of the target and example scripts.

Structure

The ABR Control library is divided into three sections:

1. Arm descriptions (and simulations)
2. Robotic system interfaces
3. Controllers

The big goal was to make all of these interchangeable, so that to run any permutation of them you just have to change which arm / interface / controller you’re importing.

To support a new arm, the user only needs to create a configuration file specifying the transforms and inertia matrices. Code for calculating the necessary functions of the arm will be symbolically derived using SymPy, and compiled to C using Cython for efficient run-time execution.

Interfaces provide send_forces and send_target_angles functions, to apply torques and put the arm in a target state, as well as a get_feedback function, which returns a dictionary of information about the current state of the arm (joint angles and velocities at a minimum).

Controllers provide a generate function, which take in current system state information and a target, and return a set of joint torques to apply to the robot arm.

VREP example

The easiest way to show it is with some code examples. So, once you’ve cloned and installed the repo, you can open up VREP and the jaco2.ttt model in the abr_control/arms/jaco2 folder, and to control it using an operational space controller you would run the following:

import numpy as np
from abr_control.arms import jaco2 as arm
from abr_control.controllers import OSC
from abr_control.interfaces import VREP

# initialize our robot config for the ur5
robot_config = arm.Config(use_cython=True, hand_attached=True)

# instantiate controller
ctrlr = OSC(robot_config, kp=200, vmax=0.5)

# create our VREP interface
interface = VREP(robot_config, dt=.001)
interface.connect()

target_xyz = np.array([0.2, 0.2, 0.2])
# set the target object's position in VREP
interface.set_xyz(name='target', xyz=target_xyz)

count = 0.0
while count < 1500:  # run for 1.5 simulated seconds
# get joint angle and velocity feedback
feedback = interface.get_feedback()
# calculate the control signal
u = ctrlr.generate(
q=feedback['q'],
dq=feedback['dq'],
target_pos=target_xyz)
# send forces into VREP, step the sim forward
interface.send_forces(u)

count += 1
interface.disconnect()


This is a minimal example of the examples/VREP/reaching.py code. To run it with a different arm, you can just change the from abr_control.arms import as line. The repo comes with the configuration files for the UR5 and a onelink VREP arm model as well.

PyGame example

I’ve also found the PyGame simulations of the 2 and 3 link arms very helpful for quickly testing new controllers and code, as an easy low overhead proof of concept sandbox. To run the threelink arm (which runs in Linux and Windows fine but I’ve heard has issues in Mac OS), with the operational space controller, you can run this script:

import numpy as np
from abr_control.arms import threelink as arm
from abr_control.interfaces import PyGame
from abr_control.controllers import OSC

# initialize our robot config
robot_config = arm.Config(use_cython=True)
# create our arm simulation
arm_sim = arm.ArmSim(robot_config)

# create an operational space controller
ctrlr = OSC(robot_config, kp=300, vmax=100,
use_dJ=False, use_C=True)

def on_click(self, mouse_x, mouse_y):
self.target[0] = self.mouse_x
self.target[1] = self.mouse_y

# create our interface
interface = PyGame(robot_config, arm_sim, dt=.001,
on_click=on_click)
interface.connect()

# create a target
feedback = interface.get_feedback()
target_xyz = robot_config.Tx('EE', feedback['q'])
interface.set_target(target_xyz)

try:
while 1:
# get arm feedback
feedback = interface.get_feedback()
hand_xyz = robot_config.Tx('EE', feedback['q'])

# generate an operational space control signal
u = ctrlr.generate(
q=feedback['q'],
dq=feedback['dq'],
target_pos=target_xyz)

new_target = interface.get_mousexy()
if new_target is not None:
target_xyz[0:2] = new_target
interface.set_target(target_xyz)

# apply the control signal, step the sim forward
interface.send_forces(u)

finally:
# stop and reset the simulation
interface.disconnect()


The extra bits of code just set up a hook so that when you click on the PyGame display somewhere the target moves to that point.

So! Hopefully some people find this useful for their research! It should be as easy to set up as cloning the repo, installing the requirements and running the setup file, and then trying out the examples.

If you find a bug please file an issue! If you find a way to improve it please do so and make a PR! And if you’d like to use anything in the repo commercially, please contact me.

Tagged , , , , ,

Dynamic movement primitives part 4: Avoiding obstacles – update

Edit: Previously I posted this blog post on incorporating obstacle avoidance, but after a recent comment on the code I started going through it again and found a whole bunch of issues. Enough so that I’ve recoded things and I’m going to repost it now with working examples and a description of the changes I made to get it going. The edited sections will be separated out with these nice horizontal lines. If you’re just looking for the example code, you can find it up on my pydmps repo, here.

Alright. Previously I’d mentioned in one of these posts that DMPs are awesome for generalization and extension, and one of the ways that they can be extended is by incorporating obstacle avoidance dynamics. Recently I wanted to implement these dynamics, and after a bit of finagling I got it working, and so that’s going to be the subject of this post.

There are a few papers that talk about this, but the one we’re going to use is Biologically-inspired dynamical systems for movement generation: automatic real-time goal adaptation and obstacle avoidance by Hoffmann and others from Stefan Schaal’s lab. This is actually the second paper talking about obstacle avoidance and DMPs, and this is a good chance to stress one of the most important rules of implementing algorithms discussed in papers: collect at least 2-3 papers detailing the algorithm (if possible) before attempting to implement it. There are several reasons for this, the first and most important is that there are likely some typos in the equations of one paper, by comparing across a few papers it’s easier to identify trickier parts, after which thinking through what the correct form should be is usually straightforward. Secondly, often equations are updated with simplified notation or dynamics in later papers, and you can save yourself a lot of headaches in trying to understand them just by reading a later iteration. I recklessly disregarded this advice and started implementation using a single, earlier paper which had a few key typos in the equations and spent a lot of time tracking down the problem. This is just a peril inherent in any paper that doesn’t provide tested code, which is almost all, sadface.

OK, now on to the dynamics. Fortunately, I can just reference the previous posts on DMPs here and don’t have to spend any time discussing how we arrive at the DMP dynamics (for a 2D system):

$\ddot{\textbf{y}} = \alpha_y (\beta_y( \textbf{g} - \textbf{y}) - \dot{\textbf{y}}) + \textbf{f},$

where $\alpha_y$ and $\beta_y$ are gain terms, $\textbf{g}$ is the goal state, $\textbf{y}$ is the system state, $\dot{\textbf{y}}$ is the system velocity, and $\textbf{f}$ is the forcing function.
As mentioned, DMPs are awesome because now to add obstacle avoidance all we have to do is add another term

$\ddot{\textbf{y}} = \alpha_y (\beta_y( \textbf{g} - \textbf{y}) - \dot{\textbf{y}}) + \textbf{f} + \textbf{p}(\textbf{y}, \dot{\textbf{y}}),$

where $\textbf{p}(\textbf{y}, \dot{\textbf{y}})$ implements the obstacle avoidance dynamics, and is a function of the DMP state and velocity. Now then, the question is what are these dynamics exactly?

Obstacle avoidance dynamics

It turns out that there is a paper by Fajen and Warren that details an algorithm that mimics human obstacle avoidance. The idea is that you calculate the angle between your current velocity and the direction to the obstacle, and then turn away from the obstacle. The angle between current velocity and direction to the obstacle is referred to as the steering angle, denoted $\varphi$, here’s a picture of it:

So, given some $\varphi$ value, we want to specify how much to change our steering direction, $\dot{\varphi}$, as in the figure below:

If we’re on track to hit the object (i.e. $\varphi$ is near 0) then we steer away hard, and then make your change in direction less and less as the angle between your heading (velocity) and the object is larger and larger. Formally, define $\dot{\varphi}$ as

$\dot{\varphi} = \gamma \;\varphi \;\textrm{exp}(-\beta | \varphi |),$

where $\gamma$ and $\beta$ are constants, which are specified as $1000$ and $\frac{20}{\pi}$ in the paper, respectively.

Edit: OK this all sounds great, but quickly you run into issues here. The first is how do we calculate $\varphi$? In the paper I was going off of they used a dot product between the $\dot{\textbf{y}}$ vector and the $\textbf{o} - \textbf{y}$ vector to get the cosine of the angle, and then use np.arccos to get the angle from that. There is a big problem with this here, however, that’s kind of subtle: You will never get a negative angle when you calculate this, which, of course. That’s not how np.arccos works, but it is still what we need so we will be able to tell if we should be turning left or right to avoid this object!

So we need to use a different way of calculating the angle, one that tells us if the object is in a + or – angle relative to the way we’re headed. To calculate an angle that will give us + or -, we’re going to use the np.arctan2 function.

We want to center things around the way we’re headed, so first we calculate the angle of the direction vector, $\dot{\textbf{y}}$. Then we create a rotation vector, R_dy to rotate the vector describing the direction of the object. This shifts things around so that if we’re moving straight towards the object it’s at 0 degrees, if we’re headed directly away from it, it’s at 180 degrees, etc. Once we have that vector, nooowwww we can find the angle of the direction of the object and that’s what we’re going to use for phi. Huzzah!

# get the angle we're heading in
phi_dy = -np.arctan2(dy[1], dy[0])
R_dy = np.array([[np.cos(phi_dy), -np.sin(phi_dy)],
[np.sin(phi_dy), np.cos(phi_dy)]])
# calculate vector to object relative to body
obj_vec = obstacle - y
# rotate it by the direction we're going
obj_vec = np.dot(R_dy, obj_vec)
# calculate the angle of obj relative to the direction we're going
phi = np.arctan2(obj_vec[1], obj_vec[0])


This $\dot{\varphi}$ term can be thought of as a weighting, telling us how much we need to rotate based on how close we are to running into the object. To calculate how we should rotate we’re going to calculate the angle orthonormal to our current velocity, then weight it by the distance between the object and our current state on each axis. Formally, this is written:

$\textbf{R} \; \dot{\textbf{y}},$

where $\textbf{R}$ is the axis $(\textbf{o} - \textbf{y}) \times \dot{\textbf{y}}$ rotated 90 degrees (the $\times$ denoting outer product here). The way I’ve been thinking about this is basically taking your velocity vector, $\dot{\textbf{y}}$, and rotating it 90 degrees. Then we use this rotated vector as a row vector, and weight the top row by the distance between the object and the system along the $x$ axis, and the bottom row by the difference along the $\textbf{y}$ axis. So in the end we’re calculating the angle that rotates our velocity vector 90 degrees, weighted by distance to the object along each axis.

So that whole thing takes into account absolute distance to object along each axis, but that’s not quite enough. We also need to throw in $\dot{\varphi}$, which looks at the current angle. What this does is basically look at ‘hey are we going to hit this object?’, if you are on course then make a big turn and if you’re not then turn less or not at all. Phew.

OK so all in all this whole term is written out

$\textbf{p}(\textbf{y}, \dot{\textbf{y}}) = \textbf{R} \; \dot{\textbf{y}} \; \dot{\varphi},$

and that’s what we add in to the system acceleration. And now our DMP can avoid obstacles! How cool is that?

Super compact, straight-forward to add, here’s the code:

Edit: OK, so not suuuper compact. I’ve also added in another couple checks. The big one is “Is this obstacle between us and the target or not?”. So I calculate the Euclidean distance to the goal and the obstacle, and if the obstacle is further away then the goal, set the avoidance signal to zero (performed at the end of the if)! This took care of a few weird errors where you would get a big deviation in the trajectory if it saw an obstacle past the goal, because it was planning on avoiding it, but then was pulled back in to the goal before the obstacle anyways so it was a pointless exercise. The other check added in is just to make sure you only add in obstacle avoidance if the system is actually moving. Finally, I also changed the $\gamma = 100$ instead of $1000$.

def avoid_obstacles(y, dy, goal):
p = np.zeros(2)

for obstacle in obstacles:
# based on (Hoffmann, 2009)

# if we're moving
if np.linalg.norm(dy) > 1e-5:

# get the angle we're heading in
phi_dy = -np.arctan2(dy[1], dy[0])
R_dy = np.array([[np.cos(phi_dy), -np.sin(phi_dy)],
[np.sin(phi_dy), np.cos(phi_dy)]])
# calculate vector to object relative to body
obj_vec = obstacle - y
# rotate it by the direction we're going
obj_vec = np.dot(R_dy, obj_vec)
# calculate the angle of obj relative to the direction we're going
phi = np.arctan2(obj_vec[1], obj_vec[0])

dphi = gamma * phi * np.exp(-beta * abs(phi))
R = np.dot(R_halfpi, np.outer(obstacle - y, dy))
pval = -np.nan_to_num(np.dot(R, dy) * dphi)

# check to see if the distance to the obstacle is further than
# the distance to the target, if it is, ignore the obstacle
if np.linalg.norm(obj_vec) > np.linalg.norm(goal - y):
pval = 0

p += pval
return p


And that’s it! Just add this method in to your DMP system and call avoid_obstacles at every timestep, and add it in to your DMP acceleration.

You hopefully noticed in the code that this is set up for multiple obstacles, and that all that that entailed was simply adding the p value generated by each individual obstacle. It’s super easy! Here’s a very basic graphic showing how the DMP system can avoid obstacles:

So here there’s just a basic attractor system (DMP without a forcing function) trying to move from the center position to 8 targets around the unit circle (which are highlighted in red), and there are 4 obstacles that I’ve thrown onto the field (black x’s). As you can see, the system successfully steers way clear of the obstacles while moving towards the target!

We must all use this power wisely.

Edit: Making the power yours is now easier than ever! You can check out this code at my pydmps GitHub repo. Clone the repo and after you python setup.py develop, change directories into the examples folder and run the avoid_obstacles.py file. It will randomly generate 5 targets in the environment and perform 20 movements, giving you something looking like this:

The iterative Linear Quadratic Regulator algorithm

A few months ago I posted on Linear Quadratic Regulators (LQRs) for control of non-linear systems using finite-differences. The gist of it was at every time step linearize the dynamics, quadratize (it could be a word) the cost function around the current point in state space and compute your feedback gain off of that, as though the dynamics were both linear and consistent (i.e. didn’t change in different states). And that was pretty cool because you didn’t need all the equations of motion and inertia matrices etc to generate a control signal. You could just use the simulation you had, sample it a bunch to estimate the dynamics and value function, and go off of that.

The LQR, however, operates with maverick disregard for changes in the future. Careless of the consequences, it optimizes assuming the linear dynamics approximated at the current time step hold for all time. It would be really great to have an algorithm that was able to plan out and optimize a sequence, mindful of the changing dynamics of the system.

This is exactly the iterative Linear Quadratic Regulator method (iLQR) was designed for. iLQR is an extension of LQR control, and the idea here is basically to optimize a whole control sequence rather than just the control signal for the current point in time. The basic flow of the algorithm is:

1. Initialize with initial state $x_0$ and initial control sequence $\textbf{U} = [u_{t_0}, u_{t_1}, ..., u_{t_{N-1}}]$.
2. Do a forward pass, i.e. simulate the system using $(x_0, \textbf{U})$ to get the trajectory through state space, $\textbf{X}$, that results from applying the control sequence $\textbf{U}$ starting in $x_0$.
3. Do a backward pass, estimate the value function and dynamics for each $(\textbf{x}, \textbf{u})$ in the state-space and control signal trajectories.
4. Calculate an updated control signal $\hat{\textbf{U}}$ and evaluate cost of trajectory resulting from $(x_0, \hat{\textbf{U}})$.
1. If $|(\textrm{cost}(x_0, \hat{\textbf{U}}) - \textrm{cost}(x_0, \textbf{U})| < \textrm{threshold}$ then we've converged and exit.
2. If $\textrm{cost}(x_0, \hat{\textbf{U}}) < \textrm{cost}(x_0, \textbf{U})$, then set $\textbf{U} = \hat{\textbf{U}}$, and change the update size to be more aggressive. Go back to step 2.
3. If $\textrm{cost}(x_0, \hat{\textbf{U}}) \geq$ $\textrm{cost}(x_0, \textbf{U})$ change the update size to be more modest. Go back to step 3.

There are a bunch of descriptions of iLQR, and it also goes by names like ‘the sequential linear quadratic algorithm’. The paper that I’m going to be working off of is by Yuval Tassa out of Emo Todorov’s lab, called Control-limited differential dynamic programming. And the Python implementation of this can be found up on my github in my Control repo. Also, a big thank you to Dr. Emo Todorov who provided Matlab code for the iLQG algorithm, which was super helpful.

Defining things

So let’s dive in. Formally defining things, we have our system $\textbf{x}$, and dynamics described with the function $\textbf{f}$, such that

$\textbf{x}_{t+1} = \textbf{f}(\textbf{x}_t, \textbf{u}_t),$

where $\textbf{u}$ is the input control signal. The trajectory $\{\textbf{X}, \textbf{U}\}$ is the sequence of states $\textbf{X} = \{\textbf{x}_0, \textbf{x}_1, ..., \textbf{x}_N\}$ that result from applying the control sequence $\textbf{U} = \{\textbf{u}_0, \textbf{u}_1, ..., \textbf{u}_{N-1}\}$ starting in the initial state $\textbf{x}_0$.

Now we need to define all of our cost related equations, so we know exactly what we’re dealing with.

Define the total cost function $J$, which is the sum of the immediate cost, $\ell$, from each state in the trajectory plus the final cost, $\ell_f$:

$J(\textbf{x}_0, \textbf{U}) = \sum\limits^{N-1}\limits_{t=0} \ell(\textbf{x}_t, \textbf{u}_t) + \ell_f(\textbf{x}_N).$

Letting $\textbf{U}_t = \{\textbf{u}_t, \textbf{u}_{t+1}, ..., \textbf{U}_{N-1}\}$, we define the cost-to-go as the sum of costs from time $t$ to $N$:

$J_t(\textbf{x}, \textbf{U}_t) = \sum\limits^{N-1}\limits_{i=t} \ell(\textbf{x}_i, \textbf{u}_i) + \ell_f(\textbf{x}_N).$

The value function $V$ at time $t$ is the optimal cost-to-go from a given state:

$V_t(\textbf{x}) = \min\limits_{\textbf{U}_t} J_t(\textbf{x}, \textbf{U}_t),$

where the above equation just says that the optimal cost-to-go is found by using the control sequence $\textbf{U}_t$ that minimizes $J_t$.

At the final time step, $N$, the value function is simply

$V(\textbf{x}_N) = \ell_f(\textbf{x}_N).$

For all preceding time steps, we can write the value function as a function of the immediate cost $\ell(\textbf{x}, \textbf{u})$ and the value function at the next time step:

$V(\textbf{x}) = \min\limits_{\textbf{u}} \left[ \ell(\textbf{x}, \textbf{u}) + V(\textbf{f}(\textbf{x}, \textbf{u})) \right].$

NOTE: In the paper, they use the notation $V'(\textbf{f}(\textbf{x}, \textbf{u}))$ to denote the value function at the next time step, which is redundant since $\textbf{x}_{t+1} = \textbf{f}(\textbf{x}_t, \textbf{u}_t)$, but it comes in handy later when they drop the dependencies to simplify notation. So, heads up: $V' = V(\textbf{f}(\textbf{x}, \textbf{u})$.

Forward rollout

The forward rollout consists of two parts. The first part is to simulating things to generate the $(\textbf{X}, \textbf{U}),$ from which we can calculate the overall cost of the trajectory, and find out the path that the arm will take. To improve things though we’ll need a lot of information about the partial derivatives of the system, calculating these is the second part of the forward rollout phase.

To calculate all these partial derivatives we’ll use $(\textbf{X}, \textbf{U})$. For each $(\textbf{x}_t, \textbf{u}_t)$ we’ll calculate the derivatives of $\textbf{f}(\textbf{x}_t, \textbf{u}_t)$ with respect to $\textbf{x}_t$ and $\textbf{u}_t$, which will give us what we need for our linear approximation of the system dynamics.

To get the information we need about the value function, we’ll need the first and second derivatives of $\ell(\textbf{x}_t, \textbf{u}_t)$ and $\ell_f(\textbf{x}_t, \textbf{x}_t)$ with respect to $\textbf{x}_t$ and $\textbf{u}_t$.

So all in all, we need to calculate $\textbf{f}_\textbf{x}$, $\textbf{f}_\textbf{u}$, $\ell_\textbf{x}$, $\ell_\textbf{u}$, $\ell_\textbf{xx}$, $\ell_\textbf{ux}$, $\ell_\textbf{uu}$, where the subscripts denote a partial derivative, so $\ell_\textbf{x}$ is the partial derivative of $\ell$ with respect to $\textbf{x}$, $\ell_\textbf{xx}$ is the second derivative of $\ell$ with respect to $\textbf{x}$, etc. And to calculate all of these partial derivatives, we’re going to use finite differences! Just like in the LQR with finite differences post. Long story short, load up the simulation for every time step, slightly vary one of the parameters, and measure the resulting change.

Once we have all of these, we’re ready to move on to the backward pass.

Backward pass

Now, we started out with an initial trajectory, but that was just a guess. We want our algorithm to take it and then converge to a local minimum. To do this, we’re going to add some perturbing values and use them to minimize the value function. Specifically, we’re going to compute a local solution to our value function using a quadratic Taylor expansion. So let’s define $Q(\delta \textbf{x}, \delta \textbf{u})$ to be the change in our value function at $(\textbf{x}, \textbf{u})$ as a result of small perturbations $(\delta \textbf{x}, \delta \textbf{u})$:

$Q(\delta \textbf{x}, \delta \textbf{u}) = \ell (\textbf{x} + \delta \textbf{x}, \textbf{u} + \delta \textbf{u}) + V(\textbf{f}(\textbf{x} + \delta\textbf{x}, \textbf{u} + \delta \textbf{u})).$

The second-order expansion of $Q$ is given by:

$Q_\textbf{x} = \ell_\textbf{x} + \textbf{f}_\textbf{x}^T V'_\textbf{x},$

$Q_\textbf{u} = \ell_\textbf{u} + \textbf{f}_\textbf{u}^T V'_\textbf{x},$

$Q_\textbf{xx} = \ell_\textbf{xx} + \textbf{f}_\textbf{x}^T V'_\textbf{xx} \textbf{f}_\textbf{x} + V'_\textbf{x} \cdot \textbf{f}_\textbf{xx},$

$Q_\textbf{ux} = \ell_\textbf{ux} + \textbf{f}_\textbf{u}^T V'_\textbf{xx} \textbf{f}_\textbf{x}+ V'_\textbf{x} \cdot \textbf{f}_\textbf{ux},$

$Q_\textbf{uu} = \ell_\textbf{uu} + \textbf{f}_\textbf{u}^T V'_\textbf{xx} \textbf{f}_\textbf{u}+ V'_\textbf{x} \cdot \textbf{f}_\textbf{uu}.$

Remember that $V' = V(\textbf{f}(\textbf{x}, \textbf{u}))$, which is the value function at the next time step. NOTE: All of the second derivatives of $\textbf{f}$ are zero in the systems we’re controlling here, so when we calculate the second derivatives we don’t need to worry about doing any tensor math, yay!

Given the second-order expansion of $Q$, we can to compute the optimal modification to the control signal, $\delta \textbf{u}^*$. This control signal update has two parts, a feedforward term, $\textbf{k}$, and a feedback term $\textbf{K} \delta\textbf{x}$. The optimal update is the $\delta\textbf{u}$ that minimizes the cost of $Q$:

$\delta\textbf{u}^*(\delta \textbf{x}) = \min\limits_{\delta\textbf{u}}Q(\delta\textbf{x}, \delta\textbf{u}) = \textbf{k} + \textbf{K}\delta\textbf{x},$

where $\textbf{k} = -Q^{-1}_\textbf{uu} Q_\textbf{u}$ and $\textbf{K} = -Q^{-1}_\textbf{uu} Q_\textbf{ux}.$

Derivation can be found in this earlier paper by Li and Todorov. By then substituting this policy into the expansion of $Q$ we get a quadratic model of $V$. They do some mathamagics and come out with:

$V_\textbf{x} = Q_\textbf{x} - \textbf{K}^T Q_\textbf{uu} \textbf{k},$

$V_\textbf{xx} = Q_\textbf{xx} - \textbf{K}^T Q_\textbf{uu} \textbf{K}.$

So now we have all of the terms that we need, and they’re defined in terms of the values at the next time step. We know the value of the value function at the final time step $V_N = \ell_f(\textbf{x}_N)$, and so we’ll simply plug this value in and work backwards in time recursively computing the partial derivatives of $Q$ and $V$.

Calculate control signal update

Once those are all calculated, we can calculate the gain matrices, $\textbf{k}$ and $\textbf{K}$, for our control signal update. Huzzah! Now all that’s left to do is evaluate this new trajectory. So we set up our system

$\hat{\textbf{x}}_0 = \textbf{x}_0,$

$\hat{\textbf{u}}_t = \textbf{u}_t + \textbf{k}_t + \textbf{K}_t (\hat{\textbf{x}}_t - \textbf{x}_t),$

$\hat{\textbf{x}}_{t+1} = \textbf{f}(\hat{\textbf{x}}_t, \hat{\textbf{u}}_t),$

and record the cost. Now if the cost of the new trajectory $(\hat{\textbf{X}}, \hat{\textbf{U}})$ is less than the cost of $(\textbf{X}, \textbf{U})$ then we set $\textbf{U} = \hat{\textbf{U}}$ and go do it all again! And when the cost from an update becomes less than a threshold value, call it done. In code this looks like:

if costnew < cost:
sim_new_trajectory = True

if (abs(costnew - cost)/cost) < self.converge_thresh:
break


Of course, another option we need to account for is when costnew > cost. What do we do in this case? Our control update hasn’t worked, do we just exit?

The Levenberg-Marquardt heuristic
No! Phew.

The control signal update in iLQR is calculated in such a way that it can behave like Gauss-Newton optimization (which uses second-order derivative information) or like gradient descent (which only uses first-order derivative information). The is that if the updates are going well, then lets include curvature information in our update to help optimize things faster. If the updates aren’t going well let’s dial back towards gradient descent, stick to first-order derivative information and use smaller steps. This wizardry is known as the Levenberg-Marquardt heuristic. So how does it work?

Something we skimmed over in the iLQR description was that we need to calculate $Q^{-1}_\textbf{uu}$ to get the $\textbf{k}$ and $\textbf{K}$ matrices. Instead of using np.linalg.pinv or somesuch, we’re going to calculate the inverse ourselves after finding the eigenvalues and eigenvectors, so that we can regularize it. This will let us do a couple of things. First, we’ll be able to make sure that our estimate of curvature ($Q_\textbf{uu}^{-1}$) stays positive definite, which is important to make sure that we always have a descent direction. Second, we’re going to add a regularization term to the eigenvalues to prevent them from exploding when we take their inverse. Here’s our regularization implemented in Python:


Q_uu_evals, Q_uu_evecs = np.linalg.eig(Q_uu)
Q_uu_evals[Q_uu_evals < 0] = 0.0
Q_uu_evals += lamb
Q_uu_inv = np.dot(Q_uu_evecs,
np.dot(np.diag(1.0/Q_uu_evals), Q_uu_evecs.T))


Now, what happens when we change lamb? The eigenvalues represent the magnitude of each of the eigenvectors, and by taking their reciprocal we flip the contributions of the vectors. So the ones that were contributing the least now have the largest singular values, and the ones that contributed the most now have the smallest eigenvalues. By adding a regularization term we ensure that the inverted eigenvalues can never be larger than 1/lamb. So essentially we throw out information.

In the case where we’ve got a really good approximation of the system dynamics and value function, we don’t want to do this. We want to use all of the information available because it’s accurate, so make lamb small and get a more accurate inverse. In the case where we have a bad approximation of the dynamics we want to be more conservative, which means not having those large singular values. Smaller singular values give a smaller $Q_\textbf{uu}^{-1}$ estimate, which then gives smaller gain matrices and control signal update, which is what we want to do when our control signal updates are going poorly.

How do you know if they’re going poorly or not, you now surely ask! Clever as always, we’re going to use the result of the previous iteration to update lamb. So adding to the code from just above, the end of our control update loop is going to look like:

lamb = 1.0 # initial value of lambda
...
if costnew < cost:
lamb /= self.lamb_factor
sim_new_trajectory = True

if (abs(costnew - cost)/cost) < self.converge_thresh:
break
else:
lamb *= self.lamb_factor
if lamb > self.max_lamb:
break


And that is pretty much everything! OK let’s see how this runs!

Simulation results

If you want to run this and see for yourself, you can go copy my Control repo, navigate to the main directory, and run

python run.py arm2 reach


or substitute in arm3. If you’re having trouble getting the arm2 simulation to run, try arm2_python, which is a straight Python implementation of the arm dynamics, and should work no sweat for Windows and Mac.

Below you can see results from the iLQR controller controlling the 2 and 3 link arms (click on the figures to see full sized versions, they got distorted a bit in the shrinking to fit on the page), using immediate and final state cost functions defined as:

l = np.sum(u**2)


and

pos_err = np.array([self.arm.x[0] - self.target[0],
self.arm.x[1] - self.target[1]])
l = (wp * np.sum(pos_err**2) + # pos error
wv * np.sum(x[self.arm.DOF:self.arm.DOF*2]**2)) # vel error


where wp and wv are just gain values, x is the state of the system, and self.arm.x is the $(x,y)$ position of the hand. These read as “during movement, penalize large control signals, and at the final state, have a big penalty on not being at the target.”

So let’s give it up for iLQR, this is awesome! How much of a crazy improvement is that over LQR? And with all knowledge of the system through finite differences, and with the full movements in exactly 1 second! (Note: The simulation speeds look different because of my editing to keep the gif sizes small, they both take the same amount of time for each movement.)

Changing cost functions
Something that you may notice is that the control of the 3 link is actually straighter than the 2 link. I thought that this might be just an issue with the gain values, since the scale of movement is smaller for the 2 link arm than the 3 link there might have been less of a penalty for not moving in a straight line, BUT this was wrong. You can crank the gains and still get the same movement. The actual reason is that this is what the cost function specifies, if you look in the code, only $\ell_f(\textbf{x}_N)$ penalizes the distance from the target, and the cost function during movement is strictly to minimize the control signal, i.e. $\ell(\textbf{x}_t, \textbf{u}_t) = \textbf{u}_t^2$.

Well that’s a lot of talk, you say, like the incorrigible antagonist we both know you to be, prove it. Alright, fine! Here’s iLQR running with an updated cost function that includes the end-effector’s distance from the target in the immediate cost:

All that I had to do to get this was change the immediate cost from

l = np.sum(u**2)


to

l = np.sum(u**2)
pos_err = np.array([self.arm.x[0] - self.target[0],
self.arm.x[1] - self.target[1]])
l += (wp * np.sum(pos_err**2) + # pos error
wv * np.sum(x[self.arm.DOF:self.arm.DOF*2]**2)) # vel error


where all I had to do was include the position penalty term from the final state cost into the immediate state cost.

Changing sequence length
In these simulations the system is simulating at .01 time step, and I gave it 100 time steps to reach the target. What if I give it only 50 time steps?

It looks pretty much the same! It’s just now twice as fast, which is of course achieved by using larger control signals, which we don’t see, but dang awesome.

What if we try to make it there in 10 time steps??

OK well that does not look good. So what’s going on in this case? Basically we’ve given the algorithm an impossible task. It can’t make it to the target location in 10 time steps. In the implementation I wrote here, if it hits the end of it’s control sequence and it hasn’t reached the target yet, the control sequence starts over back at t=0. Remember that part of the target state is also velocity, so basically it moves for 10 time steps to try to minimize $(x,y)$ distance, and then slows down to minimize final state cost in the velocity term.

In conclusion

This algorithm has been used in a ton of things, for controlling robots and simulations, and is an important part of guided policy search, which has been used to very successfully train deep networks in control problems. It’s getting really impressive results for controlling the arm models that I’ve built here, and using finite differences should easily generalize to other systems.

iLQR is very computationally expensive, though, so that’s definitely a downside. It’s definitely less expensive if you have the equations of your system, or at least a decent approximation of them, and you don’t need to use finite differences. But you pay for the efficiency with a loss in generality.

There are also a bunch of parameters to play around with that I haven’t explored at all here, like the weights in the cost function penalizing the magnitude of the cost function and the final state position error. I showed a basic example of changing the cost function, which hopefully gets across just how easy changing these things out can be when you’re using finite differences, and there’s a lot to play around with there too.

Implementation note
In the Yuval and Todorov paper, they talked about using backtracking line search when generating the control signal. So the algorithm they had when generating the new control signal was actually:

$\hat{\textbf{u}}_t = \hat{\textbf{u}}_t + \alpha\textbf{k}_t + \textbf{K}_t(\hat{\textbf{x}}_t - \textbf{x}_t)$

where $\alpha$ was the backtracking search parameter, which gets set to one initially and then reduced. It’s very possible I didn’t implement it as intended, but I found consistently that $\alpha = 1$ always generated the best results, so it was just adding computation time. So I left it out of my implementation. If anyone has insights on an implementation that improves results, please let me know!

And then finally, another thank you to Dr. Emo Todorov for providing Matlab code for the iLQG algorithm, which was very helpful, especially for getting the Levenberg-Marquardt heuristic implemented properly.

Linear-Quadratic Regulation for non-linear systems using finite differences

One of the standard controllers in basic control theory is the linear-quadratic regulator (LQR). There is a finite-horizon case (where you have a limited amount of time), and an infinite-horizon case (where you don’t); in this post, for simplicity, we’re only going to be dealing with the infinite-horizon case.

The LQR is designed to handle a very specific kind of problem. First, it assumes you are controlling a system with linear dynamics, which means you can express them as

$\dot{\textbf{x}} = \textbf{A}\textbf{x} + \textbf{B}\textbf{u}$,

where $\textbf{x}$ and $\dot{\textbf{x}}$ are the state and its time derivative, $\textbf{u}$ is the input, and $\textbf{A}$ and $\textbf{B}$ capture the effects of the state and input on the derivative. And second, it assumes that the cost function, denoted $J$, is a quadratic of the form

$J = \int_0^{\infty} \left( (\textbf{x} - \textbf{x}^*)^T \textbf{Q} (\textbf{x} - \textbf{x}^*) + \textbf{u}^T \textbf{R} \textbf{u} \right) dt$

where $\textbf{x}^*$ is the target state, and $\textbf{Q} = \textbf{Q}^T \geq 0$ and $\textbf{R} = \textbf{R}^T \geq 0$ are weights on the cost of not being at the target state and applying a control signal. The higher $\textbf{Q}$ is, the more important it is to get to the target state asap, the higher $\textbf{R}$ is, the more important it is to keep the control signal small as you go to the target state.

The goal of the LQR is to calculate a feedback gain matrix $\textbf{K}$ such that

$\textbf{u} = -\textbf{K} \textbf{x},$

drives the system to the target. When the system is a linear system with a quadratic cost function, this can be done optimally. There is lots of discussion elsewhere about LQRs and their derivation, so I’m not going to go into that with this post. Instead, I’m going to talk about applying LQRs to non-linear systems, and using finite differences to do it, which works when you have a readily accessible simulation of the system on hand. The fun part is that by using finite differences you can get this to work without working out the dynamics equations yourself.

Using LQRs on non-linear systems

As you may have noticed, non-linear systems violate the first assumption of a linear quadratic regulator; that the system is linear. That doesn’t mean that we can’t apply it, it just means that it’s not going to be optimal. How poorly the LQR will perform depends on a few things, two important factors being how non-linear the system dynamics actually are, and how often you’re able to update the feedback gain matrix $\textbf{K}$. To apply LQR to non-linear systems we’re just going to close our eyes and pretend that the system dynamics are linear, i.e. they fit the form

$\dot{\textbf{x}} = \textbf{A}\textbf{x} + \textbf{B}\textbf{u}.$

We’ll do this by approximating the actual dynamics of the system linearly. We’ll then solve for our gain value $\textbf{K}$, generate our control signal for this timestep, and then re-approximate the dynamics again at the next time step and solve for $\textbf{K}$ from the new state. The more non-linear the system dynamics are, the less appropriate $\textbf{K}$ will be for generating our control signal $\textbf{u}$ as we move away from the state $\textbf{K}$ was calculated in; this is why update time of the LQR can become an important factor.

Using finite-differences to approximate system dynamics

An important question, then, is how do we find this system approximation? How can we calculate the $\textbf{A}$ and $\textbf{B}$ matrices that we then use to solve for $\textbf{K}$? If we know the dynamics of the system to be

$\dot{\textbf{x}} = f(\textbf{x}, \textbf{u})$,

then we can calculate

$\textbf{A} = \frac{\partial f(\textbf{x}, \textbf{u})}{\partial \textbf{x}}, \;\;\;\; \textbf{B} = \frac{\partial f(\textbf{x}, \textbf{u})}{\partial \textbf{u}}$.

If you’re going to try this for the 3-link arm, though, get out Mathematica. Do not try this by hand. If you disregard my warning and foolhardily attempt such a derivation you will regret, repent, and then appeal to Wolfram Alpha for salvation. These equations quickly become terrible and long even for seemingly not-so-complicated systems.

There are a few ways to skirt this. Here we’re going to assume that the system under control is a simulation, or that we at least have access to an accurate model, and use the finite differences method to compute these values. The idea behind finite differences is to approximate the rate of change of the function $f$ at the point $x$ by sampling $f$ near $x$ and using the difference to calculate $\dot{f}(x)$. Here’s a picture for a 1D system:

So here, our current state $x$ is the blue dot, and the red dots represent the sample points $x + \Delta x$ and $x - \Delta x$. We can then calculate

$\dot{f}(x) \approx \frac{f(x+\Delta x) - f(x-\Delta x)}{2\Delta x},$

and you can see the actual rate of change of $f$ at $x$ plotted in the blue dashed line, and the approximated rate of change calculated using finite differences plotted in the red dashed line. We can also see that the approximated derivative is only accurate near $x$ (the blue dot).

Back in our multi-dimensional system, to use finite differences to calculate the derivative with respect to the state and the input we’re going to vary each of the dimensions of the state and input by some small amount one at a time, calculating the effects of each one by one. Here’s a chunk of pseudo-code to hopefully clarify this idea:

eps = 1e-5
A = np.zeros((len(current_state), len(current_state))
for ii in range(len(current_state)):
x = current_state.copy()
x[ii] += eps
x_inc = simulate_system(state=x, input=control_signal)
x = current_state.copy()
x[ii] -= eps
x_dec = simulate_system(state=x, input=control_signal)
A[:,ii] = (x_inc - x_dec) / (2 * eps)

B = np.zeros((len(current_state), len(control_signal))
for ii in range(len(control_signal)):
u = control_signal.copy()
u[ii] += eps
x_inc = simulate_system(state=current_state, input=u)
u = control_signal.copy()
u[ii] -= eps
x_dec = simulate_system(state=current_state, input=u)
B[:,ii] = (x_inc - x_dec) / (2 * eps)


Now we’re able to generate our $\textbf{A}$ and $\textbf{B}$ matrices we have everything we need to solve for our feedback gain matrix $\textbf{K}$! Which is great.

Note on using finite differences in continuous vs discrete setup

Something that’s important to straighten out too is what exactly is returned by the simulate_system function in the code above. In the continuous case, your system is captured as

$\dot{\textbf{x}} = \textbf{A}\textbf{x} + \textbf{B}\textbf{u},$,

where in the discrete case your system is defined

$\textbf{x}(t+1) = \textbf{A}\textbf{x}(t) + \textbf{B}\textbf{u}(t).$

If you are calculating your feedback gain matrix $\textbf{K}$ using the continuous solution to the algebraic Riccati equation, then you need to be returning $\dot{\textbf{x}}(t)$. If you’re solving for $\textbf{K}$ using the discrete solution to the algebraic Riccati equation you need to return $\textbf{x}(t+1)$. This was just something that I came across as I was coding and so I wanted to mention it here in case anyone else stumbled across it!

Applying LQR to 2 and 3 link arm control

Alright! Let’s have a look at how the LQR does controlling non-linear systems. Below we have the control of a 2-link arm compared to a 3-link arm, and you can see the control of the 2-link arm is better. This is a direct result of the dynamics of a 3-link arm being significantly more complex.

Note on controlling at different timesteps

When I was first testing the LQR controller I expected the effects of different control update times to be a lot more significant than it was. As it turns out, for controlling a 3-link arm, there’s not really a visible difference in a controller that is updating every .01 seconds vs every .001 seconds vs every .0001 seconds. Let’s have a look:

Can’t even tell, eh? Fun fact, the simulation took 1 minute 30 seconds at .01 seconds time step and 45 minutes at .0001 seconds time step. The left-most animation is the .01 seconds and the right-most the .0001 seconds. But why is there seemingly so little difference? Well, this boils down to the dynamics of the 3-link arm changing actually pretty slowly. Below I’ve plotted just a few of the elements from the $\textbf{A}$, $\textbf{B}$, and $\textbf{K}$ matrices over .5 seconds of simulation time:

So, there are some obvious points where sampling the dynamics at a .01 time step is noticeably less accurate, but all in all there’s not a huuuggge difference between sampling at .01 and .0001 seconds. If you’re just watching the end-effector path it’s really not very noticeable. You can see how the elements of $\textbf{A}$ and $\textbf{B}$ are changing fairly slowly; this means that $\textbf{K}$ is going to be an effective feedback gain for a fair chunk of time. And the computational savings you get by sampling the dynamics and regenerating $\textbf{K}$ every .01 seconds instead of every .0001 seconds are pretty big. This was just another thing that I came across when playing around with the LQR, the take away being don’t just assume you need to update your system crazy often. You might get very comparable performance for much less computational cost.

Conclusions

All in all, the LQR controller is pretty neat! It’s really simple to set up, and generic. We don’t need any specific information about the system dynamics, like we do for effective operational space control (OSC). When we estimate the dynamics with finite differences, all need is a decent system model that we can sample. Again, the more non-linear the system, of course, the less effective a LQR will be. If you’re interested in playing around with one, or generating the figures that I show above, the code is all up and running on my Github for you to explore.

Workshop talk – Methods for scaling neural computation

A couple of months ago I gave a talk at the Neuro-Inspired Computational Elements (NICE) workshop, about the use of cortical microcircuits for adaptation in the prediction and control problems. The talk was recorded, and here is a link: http://1.usa.gov/1tg2n2I Ignore the fire alarm that gets pulled in the first minute, the actual talk starts around 1 minute.

The were a lot of talks in general that were very interesting, which are also available online here: http://nice.sandia.gov/videos.html

I’ll be writing up some posts on the subject matter of my talk in the next couple months, explaining the methods in more detail and providing some solid examples with code. Hope you find the video interesting!

Robot control part 2: Jacobians, velocity, and force

Jacobian matrices are a super useful tool, and heavily used throughout robotics and control theory. Basically, a Jacobian defines the dynamic relationship between two different representations of a system. For example, if we have a 2-link robotic arm, there are two obvious ways to describe its current position: 1) the end-effector position and orientation (which we will denote $\textbf{x}$), and 2) as the set of joint angles (which we will denote $\textbf{q}$). The Jacobian for this system relates how movement of the elements of $\textbf{q}$ causes movement of the elements of $\textbf{x}$. You can think of a Jacobian as a transform matrix for velocity.

Formally, a Jacobian is a set of partial differential equations:

$\textbf{J} = \frac{\partial \textbf{x}}{\partial \textbf{q}}$.

With a bit of manipulation we can get a neat result:

$\textbf{J} = \frac{\partial \textbf{x}}{\partial t} \; \frac{\partial t}{\partial \textbf{q}} \rightarrow \frac{\partial \textbf{x}}{\partial \textbf{t}} = \textbf{J} \frac{\partial \textbf{q}}{\partial t}$,

or

$\dot{\textbf{x}} = \textbf{J} \; \dot{\textbf{q}}$,

where $\dot{\textbf{x}}$ and $\dot{\textbf{q}}$ represent the time derivatives of $\textbf{x}$ and $\textbf{q}$. This tells us that the end-effector velocity is equal to the Jacobian, $\textbf{J}$, multiplied by the joint angle velocity.

Why is this important? Well, this goes back to our desire to control in operational (or task) space. We’re interested in planning a trajectory in a different space than the one that we can control directly. Iin our robot arm, control is effected through a set of motors that apply torque to the joint angles, BUT what we’d like is to plan our trajectory in terms of end-effector position (and possibly orientation), generating control signals in terms of forces to apply in $(x,y,z)$ space. Jacobians allow us a direct way to calculate what the control signal is in the space that we control (torques), given a control signal in one we don’t (end-effector forces). The above equivalence is a first step along the path to operational space control. As just mentioned, though, what we’re really interested in isn’t relating velocities, but forces. How can we do this?

Energy equivalence and Jacobians
Conservation of energy is a property of all physical systems where the amount of energy expended is the same no matter how the system in question is being represented. The planar two-link robot arm shown below will be used for illustration.

Let the joint angle positions be denoted $\textbf{q} = [q_0, q_1]^T$, and end-effector position be denoted $\textbf{x} = [x, y, 0]^T$.

Work is the application of force over a distance

$\textbf{W} = \int \textbf{F}^T \textbf{v} \; dt,$

where $\textbf{W}$ is work, $\textbf{F}$ is force, and $\textbf{v}$ is velocity.

Power is the rate at which work is performed

$\textbf{P} = \frac{\textbf{W}}{t},$

where $\textbf{P}$ is power.
Substituting in the equation for work into the equation for power gives:

$\textbf{P} = \frac{\textbf{W}}{t} = \frac{\textbf{F}^T \textbf{d}}{t} = \textbf{F}^T \frac{\textbf{d}}{t} = \textbf{F}^T\textbf{v}.$

Because of energy equivalence, work is performed at the same rate regardless of the characterization of the system. Rewriting this terms of end-effector space gives:

$\textbf{P} = \textbf{F}_\textbf{x}^T \dot{\textbf{x}},$

where $\textbf{F}_\textbf{x}$ is the force applied to the hand, and $\dot{\textbf{x}}$ is the velocity of the hand. Rewriting the above in terms of joint-space gives:

$\textbf{P} = \textbf{F}_\textbf{q}^T \dot{\textbf{q}},$

where $\textbf{F}_\textbf{q}$ is the torque applied to the joints, and $\dot{\textbf{q}}$ is the angular velocity of the joints. Setting these two equations (in end-effector and joint space) equal to each other and substituting in our equation for the Jacobian gives:

$\textbf{F}_{q_{hand}}^T \dot{\textbf{q}} = \textbf{F}_\textbf{x}^T \dot{\textbf{x}},$

$\textbf{F}_{q_{hand}}^T \dot{\textbf{q}} = \textbf{F}_\textbf{x}^T \textbf{J}_{ee}(\textbf{q}) \; \dot{\textbf{q}},$

$\textbf{F}_{q_{hand}}^T \textbf{F}_\textbf{x}^T \textbf{J}_{ee}(\textbf{q}),$

$\textbf{F}_{q_{hand}} = \textbf{J}_{ee}^T(\textbf{q}) \textbf{F}_\textbf{x}.$

where $\textbf{J}_{ee}(\textbf{q})$ is the Jacobian for the end-effector of the robot, and $\textbf{F}_{q_{hand}}$ represents the forces in joint-space that affect movement of the hand. This says that not only does the Jacobian relate velocity from one state-space representation to another, it can also be used to calculate what the forces in joint space should be to effect a desired set of forces in end-effector space.

Building the Jacobian

First, we need to define the relationship between the $(x,y,z)$ position of the end-effector and the robot’s joint angles, $(q_0, q_1)$. However will we do it? Well, we know the distances from the shoulder to the elbow, and elbow to the wrist, as well as the joint angles, and we’re interested in finding out where the end-effector is relative to a base coordinate frame…OH MAYBE we should use those forward transformation matrices from the previous post. Let’s do it!

The forward transformation matrix

Recall that transformation matrices allow a given point to be transformed between different reference frames. In this case, the position of the end-effector relative to the second joint of the robot arm is known, but where it is relative to the base reference frame (the first joint reference frame in this case) is of interest. This means that only one transformation matrix is needed, transforming from the reference frame attached to the second joint back to the base.

The rotation part of this matrix is straight-forward to define, as in the previous section:

$^1_0\textbf{R} = \left[ \begin{array}{ccc} cos(q_0) & -sin(q_0) & 0 \\ sin(q_0) & cos(q_0) & 0 \\ 0 & 0 & 1 \end{array} \right].$

The translation part of the transformation matrices is a little different than before because reference frame 1 changes as a function of the angle of the previous joint’s angles. From trigonometry, given a vector of length $r$ and an angle $q$ the $x$ position of the end point is defined $r \; cos(q)$, and the $y$ position is $r \; sin(q)$. The arm is operating in the $(x,y)$ plane, so the $z$ position will always be 0.

Using this knowledge, the translation part of the transformation matrix is defined:

$^1_0\textbf{D} = \left[ \begin{array}{c} L_0 cos(q_0) \\ L_0 sin(q_0) \\ 0 \end{array} \right].$

Giving the forward transformation matrix:

$^1_0\textbf{T} = \left[ \begin{array}{cc} ^1_0\textbf{R} & ^1_0\textbf{D} \\ \textbf{0} & \textbf{1} \end{array} \right] = \left[ \begin{array}{cccc} cos(q_0) & -sin(q_0) & 0 & L_0 cos(q_0) \\ sin(q_0) & cos(q_0) & 0 & L_0 sin(q_0)\\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array} \right],$

which transforms a point from reference frame 1 (elbow joint) to reference frame 0 (shoulder joint / base).

The point of interest is the end-effector which is defined in reference frame 1 as a function of joint angle, $q_1$ and the length of second arm segment, $L_1$:

$\textbf{x} = \left[ \begin{array}{c} L_1 cos(q_1) \\ L_1 sin(q_1) \\ 0 \\ 1 \end{array} \right].$

To find the position of our end-effector in terms of the origin reference frame multiply the point $\textbf{x}$ by the transformation $^1_0\textbf{T}$:

$^1_0\textbf{T} \; \textbf{x} = \left[ \begin{array}{cccc} cos(q_0) & -sin(q_0) & 0 & L_0 cos(q_0) \\ sin(q_0) & cos(q_0) & 0 & L_0 sin(q_0)\\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array} \right] \; \left[ \begin{array}{c} L_1 cos(q_1) \\ L_1 sin(q_1) \\ 0 \\ 1 \end{array} \right],$

$^1_0\textbf{T} \textbf{x} = \left[ \begin{array}{c} L_1 cos(q_0) cos(q_1) - L_1 sin(q_0) sin(q_1) + L_0 cos(q_0) \\ L_1 sin(q_0) cos(q_1) + L_1 cos(q_0) sin(q_1) + L_0 sin(q_0) \\ 0 \\ 1 \end{array} \right]$

where, by pulling out the $L_1$ term and using the trig identities

$cos(\alpha)cos(\beta) - sin(\alpha)sin(\beta) = cos(\alpha + \beta),$

and

$sin(\alpha)cos(\beta) + cos(\alpha)sin(\beta) = sin(\alpha + \beta),$

the position of our end-effector can be rewritten:

$\left[ \begin{array}{c} L_0 cos(q_0) + L_1 cos(q_0 + q_1) \\ L_0 sin(q_0) + L_1 sin(q_0 + q_1) \\ 0 \end{array} \right],$

which is the position of the end-effector in terms of joint angles. As mentioned above, however, both the position of the end-effector and its orientation are needed; the rotation of the end-effector relative to the base frame must also be defined.

Accounting for orientation

Fortunately, defining orientation is simple, especially for systems with only revolute and prismatic joints (spherical joints will not be considered here). With prismatic joints, which are linear and move in a single plane, the rotation introduced is 0. With revolute joints, the rotation of the end-effector in each axis is simply a sum of rotations of each joint in their respective axes of rotation.

In the example case, the joints are rotating around the $z$ axis, so the rotation part of our end-effector state is

$\left[ \begin{array}{c} \omega_x \\ \omega_y \\ \omega_z \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \\ q_0 + q_1 \end{array} \right],$

where $\omega$ denotes angular rotation. If the first joint had been rotating in a different plane, e.g. in the $(x, z)$ plane around the $y$ axis instead, then the orientation would be

$\left[ \begin{array}{c} \omega_x \\ \omega_y \\ \omega_z \end{array} \right] = \left[ \begin{array}{c} 0 \\ q_0 \\ q_1 \end{array} \right].$

Partial differentiation

Once the position and orientation of the end-effector have been calculated, the partial derivative of these equations need to be calculated with respect to the elements of $\textbf{q}$. For simplicity, the Jacobian will be broken up into two parts, $J_v$ and $J_\omega$, representing the linear and angular velocity, respectively, of the end-effector.

The linear velocity part of our Jacobian is:

$\textbf{J}_v(\textbf{q}) = \left[ \begin{array}{cc} \frac{\partial x}{\partial q_0} & \frac{\partial x}{\partial q_1} \\ \frac{\partial y}{\partial q_0} & \frac{\partial y}{\partial q_1} \\ \frac{\partial z}{\partial q_0} & \frac{\partial z}{\partial q_1} \end{array} \right] = \left[ \begin{array}{cc} -L_0 sin(q_0) - L_1 sin(q_0 + q_1) & - L_1 sin(q_0 + q_1) \\ L_0 cos(q_0) + L_1 cos(q_0 + q_1) & L_1 cos(q_0 + q_1) \\ 0 & 0 \end{array} \right].$

The angular velocity part of our Jacobian is:

$\textbf{J}_\omega(\textbf{q}) = \left[ \begin{array}{cc} \frac{\partial \omega_x}{\partial q_0} & \frac{\partial \omega_x}{\partial q_1} \\ \frac{\partial \omega_y}{\partial q_0} & \frac{\partial \omega_y}{\partial q_1} \\ \frac{\partial \omega_z}{\partial q_0} & \frac{\partial \omega_z}{\partial q_1} \end{array} \right] = \left[ \begin{array}{cc} 0 & 0 \\ 0 & 0 \\ 1 & 1 \end{array} \right].$

The full Jacobian for the end-effector is then:

$\textbf{J}_{ee}(\textbf{q}) = \left[ \begin{array}{c} \textbf{J}_v(\textbf{q}) \\ \textbf{J}_\omega(\textbf{q}) \end{array} \right] = \left[ \begin{array}{cc} -L_0 sin(q_0) - L_1 sin(q_0 + q_1) & - L_1 sin(q_0 + q_1) \\ L_0 cos(q_0) + L_1 cos(q_0 + q_1) & L_1 cos(q_0 + q_1) \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \\ 1 & 1 \end{array} \right].$

Analyzing the Jacobian

Once the Jacobian is built, it can be analysed for insight about the relationship between $\dot{\textbf{x}}$ and $\dot{\textbf{q}}$.

For example, there is a large block of zeros in the middle of the Jacobian defined above, along the row corresponding to linear velocity along the $z$ axis, and the rows corresponding to the angular velocity around the $x$ and $y$ axes. This means that the $z$ position, and rotations $\omega_x$ and $\omega_y$ are not controllable. This can be seen by going back to the first Jacobian equation:

$\dot{\textbf{x}} = \textbf{J}_{ee}(\textbf{q})\;\dot{\textbf{q}}.$

No matter what the values of $\dot{\textbf{q}}$, it is impossible to affect $\omega_x$, $\omega_y$, or $z$, because the corresponding rows during the above multiplication with the Jacobian are $\textbf{0}$. These rows of zeros in the Jacobian are referred to as its `null space’. Because these variables can’t be controlled, they will be dropped from both $\textbf{F}_\textbf{x}$ and $\textbf{J}(\textbf{q})$.

Looking at the variables that can be affected it can be seen that given any two of $x, y, \omega_z$ the third can be calculated because the robot only has 2 degrees of freedom (the shoulder and elbow). This means that only two of the end-effector variables can actually be controlled. In the situation of controlling a robot arm, it is most useful to control the $(x,y)$ coordinates, so $\omega_z$ will be dropped from the force vector and Jacobian.

After removing the redundant term, the force vector representing the controllable end-effector forces is

$\textbf{F}_\textbf{x} = \left[ \begin{array}{c}f_x \\ f_y\end{array} \right],$

where $f_x$ is force along the $x$ axis, $f_y$ is force along the $y$ axis, and the Jacobian is written

$\textbf{J}_{ee}(\textbf{q}) = \left[ \begin{array}{cc} -L_0 sin(q_0) - L_1 sin(q_0 + q_1) & - L_1 sin(q_0 + q_1) \\ L_0 cos(q_0) + L_1 cos(q_0 + q_1) & L_1 cos(q_0 + q_1) \end{array} \right].$

If instead $f_{\omega_z}$, i.e. torque around the $z$ axis, were chosen as a controlled force then the force vector and Jacobian would be (assuming force along the $x$ axis was also chosen):

$\textbf{F}_\textbf{x} = \left[ \begin{array}{c} f_x \\ f_{\omega_z}\end{array} \right],$
$\textbf{J}_{ee}(\textbf{q}) = \left[ \begin{array}{cc} -L_0 sin(q_0) - L_1 sin(q_0 + q_1) & - L_1 sin(q_0 + q_1) \\ 1 & 1 \end{array} \right].$

But we’ll stick with control of the $x$ and $y$ forces instead, as it’s a little more straightforward.

Using the Jacobian

With our Jacobian, we can find out what different joint angle velocities will cause in terms of the end-effector linear and angular velocities, and we can also transform desired $(x,y)$ forces into $(\theta_0, \theta_1)$ torques. Let’s do a couple of examples. Note that in the former case we’ll be using the full Jacobian, while in the latter case we can use the simplified Jacobian specified just above.

Example 1

Given known joint angle velocities with arm configuration

$\textbf{q} = \left[ \begin{array}{c} \frac{\pi}{4} \\ \frac{3 \pi}{8} \end{array}\right] \;\;\;\; \dot{\textbf{q}} = \left[ \begin{array}{c} \frac{\pi}{10} \\ \frac{\pi}{10} \end{array} \right]$

and arm segment lengths $L_i = 1$, the $(x,y)$ velocities of the end-effector can be calculated by substituting in the system state at the current time into the equation for the Jacobian:

$\dot{\textbf{x}} = \textbf{J}_{ee}(\textbf{q}) \; \dot{\textbf{q}},$

$\dot{\textbf{x}} = \left[ \begin{array}{cc} - sin(\frac{\pi}{4}) - sin(\frac{\pi}{4} + \frac{3\pi}{8}) & - sin(\frac{\pi}{4} + \frac{3\pi}{8}) \\ cos(\frac{\pi}{4}) + cos(\frac{\pi}{4} + \frac{3\pi}{8}) & cos(\frac{\pi}{4} + \frac{3\pi}{8}) \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \\ 1 & 1 \end{array} \right] \; \left[ \begin{array}{c} \frac{\pi}{10} \\ \frac{\pi}{10} \end{array} \right],$

$\dot{\textbf{x}} = \left[ -0.8026, -0.01830, 0, 0, 0, \frac{\pi}{5} \right]^T.$

And so the end-effector is linear velocity is $(-0.8026, -0.01830, 0)^T$, with angular velocity is $(0, 0, \frac{\pi}{5})^T$.

Example 2

Given the same system and configuration as in the previous example as well as a trajectory planned in $(x,y)$ space, the goal is to calculate the torques required to get the end-effector to move as desired. The controlled variables will be forces along the $x$ and $y$ axes, and so the reduced Jacobian from the previous section will be used. Let the desired $(x,y)$ forces be

$\textbf{F}_\textbf{x} = \left[ \begin{array}{c} 1 \\ 1 \end{array}\right],$

to calculate the corresponding joint torques the desired end-effector forces and current system state parameters are substituted into the equation relating forces in in end-effector and joint space:

$\textbf{F}_\textbf{q} = \textbf{J}^T_{ee}(\textbf{q}) \textbf{F}_\textbf{x},$

$\textbf{F}_\textbf{q} = \left[ \begin{array}{cc} -sin(\frac{\pi}{4}) -sin(\frac{\pi}{4} + \frac{3\pi}{8}) & cos(\frac{\pi}{4}) + cos(\frac{\pi}{4} + \frac{3\pi}{8}) \\ -sin(\frac{\pi}{4} + \frac{3\pi}{8}) & cos(\frac{\pi}{4} + \frac{3\pi}{8}) \end{array} \right] \left[ \begin{array}{c} 1 \\ 1 \end{array} \right],$

$\textbf{F}_\textbf{q} = \left[ \begin{array}{c} -1.3066 \\ -1.3066 \end{array}\right].$

So given the current configuration to get the end-effector to move as desired, without accounting for the effects of inertia and gravity, the torques to apply to the system are $\textbf{F}_\textbf{q} = [-1.3066, -1.3066]^T$.

And now we are able to transform end-effector forces into torques, and joint angle velocities into end-effector velocities! What a wonderful, wonderful tool to have at our disposal! Hurrah for Jacobians!

Conclusions

In this post I’ve gone through how to use Jacobians to relate the movement of joint angle and end-effector system state characterizations, but Jacobians can be used to relate any two characterizations. All you need to do is define one in terms of the other and do some partial differentiation. The above example scenarios were of course very simple, and didn’t worry about compensating for anything like gravity. But don’t worry, that’s exactly what we’re going to look at in our exciting next chapter!

Something that I found interesting to consider is the need for the orientation of the end-effector and finding the angular velocities. Often in simpler robot arms, we’re only interested in the position of the end-effector, so it’s easy to write off orientation. But if we had a situation where there was a gripper attached to the end-effector, then suddenly the orientation becomes very important, often determining whether or not an object can be picked up or not.

And finally, if you’re interested in reading more about all this, I recommend checking out ‘Velocity kinematics – The manipulator Jacobian’ available online, it’s a great resource.