## Simultaneous perturbation vs finite differences for linear dynamics estimation and control signal optimization

Recently in my posts I’ve been using finite differences to approximate the gradient of loss functions and dynamical systems, with the intention of creating generalizable controllers that can be run on any system without having to calculate out derivatives beforehand. Finite differences is pretty much the most straight-forward way of approximating a gradient that there is: vary each parameter up and down (assuming we’re doing central differencing), one at a time, run it through your function and estimate the parameters effect on the system by calculating the difference between resulting function output. To do this requires 2 samples of the function for each parameter.

But there’s always more than one way to peel an avocado, and another approach that’s been used with much success is the Simultaneous Perturbation Stochastic Approximation (SPSA) algorithm, which was developed by Dr. James Spall (link to overview paper). SPSA is a method of gradient approximation, like finite differences, but, critically, the difference is that it varies all of the parameters at once, rather than one at a time. As a result, you can get an approximation of the gradient with far fewer samples from the system, and also when you don’t have explicit control over your samples (i.e. the ability to vary each parameter one at a time).

Given some function $\textbf{f}$ dependent on some set of parameters $\textbf{x}$, we’re used to finding the gradient of $\textbf{f}(\textbf{x})$ using FDSA (finite differences stochastic approximation) written in this form:

$\textbf{f}_x = \frac{\textbf{f}(\textbf{x} + \Delta \textbf{x}) - \textbf{f}(\textbf{x} - \Delta \textbf{x})}{2 \Delta \textbf{x}} = \Delta x^{-1} \frac{\textbf{f}(\textbf{x} + \Delta \textbf{x}) - \textbf{f}(\textbf{x} - \Delta \textbf{x})}{2},$

where $\Delta\textbf{x}$ is a perturbation to the parameter set, and the subscript on the left-hand side denotes the derivative of $\textbf{f}(\textbf{x})$ with respect to $\textbf{x}$.

And that’s how we’ve calculated it before, estimating the gradient of a single parameter at a time. But, we can rewrite this for a set perturbations $\Delta \textbf{X} = [\Delta\textbf{x}_0, ... , \Delta\textbf{x}_N]^T$:

$\textbf{f}_\textbf{x} = \Delta \textbf{X}^{-1} \textbf{F},$

where

$\textbf{F} = [\frac{\textbf{f}(\textbf{x} + \Delta \textbf{x}_0) - \textbf{f}(\textbf{x} - \Delta \textbf{x}_0)}{2}, ... , \frac{\textbf{f}(\textbf{x} + \Delta \textbf{x}_0) - \textbf{f}(\textbf{x}_N - \Delta \textbf{x}_N)}{2}]^T$,

which works as long as $\Delta\textbf{X}$ is square. When it’s not square (i.e. we have don’t have the same number of samples as we have parameters), we run into problems, because we can’t calculate $\Delta\textbf{X}^{-1}$ directly. To address this, let’s take a step back and then work forward again to get a more general form that works for non-square $\Delta \textbf{X}$ too.

By rewriting the above, and getting rid of the inverse by moving $\Delta\textbf{x}$ back to the other side, we have:

$\Delta\textbf{X} \; \textbf{f}_\textbf{x} = \textbf{F}$

Now, the standard trick to move a matrix that’s not square is to just make it square by multiplying it by its transpose to get a square matrix, and then the whole thing by the inverse:

$\Delta\textbf{X}^T \Delta\textbf{X} \; \textbf{f}_\textbf{x} = \Delta\textbf{X}^T \textbf{F}$

$(\Delta\textbf{X}^T \Delta\textbf{X})^{-1} (\Delta\textbf{X}^T \Delta\textbf{X}) \textbf{f}_\textbf{x} = (\Delta\textbf{X}^T \Delta\textbf{X})^{-1} \Delta\textbf{X}^T \textbf{F}$

$\textbf{f}_\textbf{x} = (\Delta\textbf{X}^T \Delta\textbf{X})^{-1} \Delta\textbf{X}^T \textbf{F}$

Alright! Now we’re comfortable with this characterization of gradient approximation using a form that works with non-square perturbation matrices.

Again, in FDSA, we only vary one parameter at a time. This means that there will only ever be one non-zero entry per row of $\Delta \textbf{X}$. By contrast, in SPSA, we vary multiple parameters, and so rows of $\Delta\textbf{X}$ will be just chalk full of non-zero entries.

Gradient approximation to estimate $\textbf{f}_\textbf{x}$ and $\textbf{f}_\textbf{u}$ for LQR control

Implementing this was pretty simple. Just had to modify the calc_derivs function, which I use to estimate the derivative of the arm with respect to the state and control signal, in my LQR controller code by changing from standard finite differences to simultaneous perturbation:

def calc_derivs(self, x, u):
"""" calculate gradient of plant dynamics using Simultaneous
Perturbation Stochastic Approximation (SPSA). Implemented
based on (Peters & Schaal, 2008).

x np.array: the state of the system
u np.array: the control signal
"""
# Initialization and coefficient selection
num_iters = 20

eps = 1e-4
delta_K = None
delta_J = None
for ii in range(num_iters):
# Generation of simultaneous perturbation vector
# choose each component from a Bernoulli +-1 distribution
# with probability of .5 for each +-1 outcome.
delta_k = np.random.choice([-1,1],
size=len(x) + len(u),
p=[.5, .5])
# Function evaluations
inc_x = np.copy(x) + eps * delta_k[:len(x)]
inc_u = np.copy(u) + eps * delta_k[len(x):]
state_inc = self.plant_dynamics(inc_x, inc_u)
dec_x = np.copy(x) - eps * delta_k[:len(x)]
dec_u = np.copy(u) - eps * delta_k[len(x):]
state_dec = self.plant_dynamics(dec_x, dec_u)

delta_j = ((state_inc - state_dec) /
(2.0 * eps)).reshape(-1)

# Track delta_k and delta_j
delta_K = delta_k if delta_K is None else \
np.vstack([delta_K, delta_k])
delta_J =  delta_j if delta_J is None else \
np.vstack([delta_J, delta_j])

f_xu = np.dot(np.linalg.pinv(np.dot(delta_K.T, delta_K)),
np.dot(delta_K.T, delta_J))
f_x = f_xu[:len(x)]
f_u = f_xu[len(x):]

return f_x.T , f_u.T


A couple notes about the above code. First, you’ll notice that the f_x and f_b matrices are both calculated at the same time. That’s pretty slick! And that calculation for f_xu is just a straight implementation of the matrix form of gradient approximation, where I’ve arranged things so that f_x is in the top part and f_u is in the lower part.

The second thing is that the perturbation vector delta_k is generated from a Bernoulli distribution. The reason behind this is that we want to have a bunch of different samples that pretty reasonably spread the state space and move all the parameters independently. Making each perturbation some distance times -1 or 1 is an easy way to achieve this.

Thirdly, there’s the num_iters variable. This is a very important variable, as it dictates how many random samples of our system we take before we estimate the gradient. I’ve found that to get this to work for both the 2-link arm and the more complex 3-link arm, it needs to be at least 20. Or else things explode and die horribly. Just…horribly.

OK let’s look at the results:

The first thing to notice is that I’ve finally discovered the Seaborn plotting package. The second is that SPSA does as well as FDSA.

You may ask: Is there any difference? Well, if we time these functions, on my lil’ laptop, for the 2-link arm it takes SPSA approximately 2.0ms, but it takes FDSA only 0.8ms. So for the same performance the SPSA is taking almost 3 times as long to run. Why? This boils down to how many times the system dynamics need to be sampled by each algorithm to get a good approximation of the gradient. For a 2-link arm, FDSA has 6 parameters ($\textbf{q}, \dot{\textbf{q}},$ and $\textbf{u}$) that it needs to sample twice (we’re doing central differencing), for a total of 12 samples. And as I mentioned above, the SPSA algorithm needs 20 samples to be stable.

For the 3-link arm, SPSA took about 3.1ms on average and FDSA (which must now perform 18 samples of the dynamics) still only 2.1ms. So number of samples isn’t the only cause of time difference between these two algorithms. SPSA needs to perform that a few more matrix operations, including a matrix inverse, which is expensive, while FDSA can calculate the gradient of each parameter individually, which is much less expensive.

OK so SPSA not really impressive here. BUT! As I discovered, there are other means of employing SPSA.

Gradient approximation to optimize the control signal directly

In the previous set up we were using SPSA to estimate the gradient of the system under control, and then we used that gradient to calculate a control signal that minimized the loss function (as specified inside the LQR). This is one way to use gradient approximation methods. Another way to use these methods is approximate the gradient of the loss function directly, and use that information to iteratively calculate a control signal that minimizes the loss function. This second application is the primary use of the SPSA algorithm, and is what’s described by Dr. Spall in his overview paper.

In this application, the algorithm works like this:

2. perturb input and simulate results
3. observe loss function and calculate gradient
4. update input to system
5. repeat to convergence

Because in this approach we’re iteratively optimizing the input using our gradient estimation, having a noisy estimate is no longer a death sentence, as it was in the LQR. If we update our input to the system with several noisy gradient estimates the noise will essentially just cancel itself out. This means that SPSA now has a powerful advantage over FDSA: Since in SPSA we vary all parameters at once, only 2 samples of the loss function are used to estimate the gradient, regardless of the number of parameters. In contrast, FDSA needs to sample the loss function twice for every input parameter. Here’s a picture from (Spall, 1998) that shows the two running against each other to optimize a 2D problem:

This gets across that even though SPSA bounces around more, they both reach the solution in the same number of steps. And, in general, this is the case, as Dr. Spall talks about in the paper. There’s also a couple more details of the algorithm, so let’s look at it in detail. Here’s the code, which is just a straight translation into Python out of the description in Dr. Spall’s paper:

# Step 1: Initialization and coefficient selection
max_iters = 5
converge_thresh = 1e-5

alpha = 0.602 # from (Spall, 1998)
gamma = 0.101
a = .101 # found empirically using HyperOpt
A = .193
c = .0277

delta_K = None
delta_J = None
u = np.copy(self.u) if self.u is not None \
else np.zeros(self.arm.DOF)
for k in range(max_iters):
ak = a / (A + k + 1)**alpha
ck = c / (k + 1)**gamma

# Step 2: Generation of simultaneous perturbation vector
# choose each component from a bernoulli +-1 distribution with
# probability of .5 for each +-1 outcome.
delta_k = np.random.choice([-1,1], size=arm.DOF, p=[.5, .5])

# Step 3: Function evaluations
inc_u = np.copy(u) + ck * delta_k
cost_inc = self.cost(np.copy(state), inc_u)
dec_u = np.copy(u) - ck * delta_k
cost_dec = self.cost(np.copy(state), dec_u)

gk = np.dot((cost_inc - cost_dec) / (2.0*ck), delta_k)

# Step 5: Update u estimate
old_u = np.copy(u)
u -= ak * gk

# Step 6: Check for convergence
if np.sum(abs(u - old_u)) < converge_thresh:
break


The main as-of-yet-unexplained parts of this code are the alpha, gamma, a, A, and c variables. What’s their deal?

Looking inside the loop, we can see that ck controls the magnitude of our perturbations. Looking a little further down, ak is just the learning rate. And all of those other parameters are just involved in shaping the trajectories that ak and ck follow through iterations, which is a path towards zero. So the first steps and perturbations are the biggest, and each successively becomes smaller as the iteration count increases.

There are a few heuristics that Dr. Spall goes over, but there aren’t any hard and fast rules for setting a, A, and c. Here, I just used HyperOpt to find some values that worked pretty well for this particular problem.

The FDSA version of this is also very straight-forward:

# Step 1: Initialization and coefficient selection
max_iters = 10
converge_thresh = 1e-5

eps = 1e-4
u = np.copy(self.u) if self.u is not None \
else np.zeros(self.arm.DOF)
for k in range(max_iters):

gk = np.zeros(u.shape)
for ii in range(gk.shape[0]):
# Step 2: Generate perturbations one parameter at a time
inc_u = np.copy(u)
inc_u[ii] += eps
dec_u = np.copy(u)
dec_u -= eps

# Step 3: Function evaluation
cost_inc = self.cost(np.copy(state), inc_u)
cost_dec = self.cost(np.copy(state), dec_u)

gk[ii] = (cost_inc - cost_dec) / (2.0 * eps)

old_u = np.copy(u)
# Step 5: Update u estimate
u -= 1e-5 * gk

# Step 6: Check for convergence
if np.sum(abs(u - old_u)) < converge_thresh:
break


You’ll notice that in both the SPSA and FDSA code we’re no longer sampling plant_dynamics, we’re instead sampling cost, a loss function I defined. From just my experience playing around with these algorithms a bit, getting the loss function to be appropriate and give the desired behaviour is definitely a bit of an art. It feels like much more of an art than in other controllers I’ve coded, but that could just be me.

The cost function that I’m using is pretty much the first thing you’d think of. It penalizes distance to target and having non-zero velocity. Getting the weighting between distance to target and velocity set up so that the arm moves to the target but also doesn’t overshoot definitely took a bit of trial and error, er, I mean empirical analysis. Here’s the cost function that I found worked pretty well, note that I had to make special cases for the different arms:

 def cost(self, x, u):
dt = .1 if self.arm.DOF == 3 else .01
next_x = self.plant_dynamics(x, u, dt=dt)
vel_gain = 100 if self.arm.DOF == 3 else 10
return (np.sqrt(np.sum((self.arm.x - self.target)**2)) * 1000 \
+ np.sum((next_x[self.arm.DOF:])**2) * vel_gain)


So that’s all the code, let’s look at the results!

For these results, I used a max of 10 iterations for optimizing the control signal. I was definitely surprised by the quality of the results, especially for the 3-link arm, compared to the results generated by a standard LQR controller. Although I need to note, again, that it was a fair bit of me playing around with the exact cost function to get these results. Lots of empirical analysis.

The two controllers generate results that are identical to visual inspection. However, especially in the 3-link arm, the time required to run the FDSA was significantly longer than the SPSA controller. It took approximately 140ms for the SPSA controller to run a single loop, but took FDSA on average 700ms for a single loop of calculating the control signal. Almost 5 times as long! For the same results! In directly optimizing the control signal, SPSA gets a big win over standard FDSA. So, if you’re looking to directly optimize over a loss function, SPSA is probably the way you want to go.

Conclusions

First off, I thought it was really neat to directly apply gradient approximation methods to optimizing the control signal. It’s something I haven’t tried before, but definitely makes sense, and can generate some really nice results when tuned properly. Automating the tuning is definitely I’ll be discussing in future posts, because doing it by hand takes a long time and is annoying.

In the LQR, the gradient approximation was best done by the FDSA. I think the main reasons for this is that in solving for the control signal the LQR algorithm uses matrix inverses, and any errors in the linear approximations to the dynamics are going to be amplified quite a bit. If I did anything less than 10-15 iterations (20 for the 3-link arm) in the SPSA approximation then things exploded. Also, here the SPSA algorithm required a matrix inverse, where the FDSA didn’t. This is because we only varied one parameter at a time in FDSA, and the effects of changing each was isolated. In the SPSA case, we had to consider the changes across all the variables and the resulting effects all at once, essentially noting which variables changed by how much and the changes in each case, and averaging. Here, even with the more complex 3-link arm, FDSA was faster, so I’m going to stick with it in my LQR and iLQR implementations.

In the direct control signal optimization SPSA beat the pants off of FDSA. It was almost 5 times faster for control of the 3-link arm. This was, again, because in this case we could use noisy samples of the gradient of the loss function and relied on noise to cancel itself out as we iterated. So we only needed 2 samples of the loss function in SPSA, where in FDSA we needed 2*num_parameters. And although this generated pretty good results I would definitely be hesitant against using this for any more complicated systems, because tuning that cost function to get out a good trajectory was a pain. If you’re interested in playing around with this, you can check out the code for the gradient controllers up on my GitHub.

## The iterative Linear Quadratic Regulator algorithm

A few months ago I posted on Linear Quadratic Regulators (LQRs) for control of non-linear systems using finite-differences. The gist of it was at every time step linearize the dynamics, quadratize (it could be a word) the cost function around the current point in state space and compute your feedback gain off of that, as though the dynamics were both linear and consistent (i.e. didn’t change in different states). And that was pretty cool because you didn’t need all the equations of motion and inertia matrices etc to generate a control signal. You could just use the simulation you had, sample it a bunch to estimate the dynamics and value function, and go off of that.

The LQR, however, operates with maverick disregard for changes in the future. Careless of the consequences, it optimizes assuming the linear dynamics approximated at the current time step hold for all time. It would be really great to have an algorithm that was able to plan out and optimize a sequence, mindful of the changing dynamics of the system.

This is exactly the iterative Linear Quadratic Regulator method (iLQR) was designed for. iLQR is an extension of LQR control, and the idea here is basically to optimize a whole control sequence rather than just the control signal for the current point in time. The basic flow of the algorithm is:

1. Initialize with initial state $x_0$ and initial control sequence $\textbf{U} = [u_{t_0}, u_{t_1}, ..., u_{t_{N-1}}]$.
2. Do a forward pass, i.e. simulate the system using $(x_0, \textbf{U})$ to get the trajectory through state space, $\textbf{X}$, that results from applying the control sequence $\textbf{U}$ starting in $x_0$.
3. Do a backward pass, estimate the value function and dynamics for each $(\textbf{x}, \textbf{u})$ in the state-space and control signal trajectories.
4. Calculate an updated control signal $\hat{\textbf{U}}$ and evaluate cost of trajectory resulting from $(x_0, \hat{\textbf{U}})$.
1. If $|(\textrm{cost}(x_0, \hat{\textbf{U}}) - \textrm{cost}(x_0, \textbf{U})| < \textrm{threshold}$ then we've converged and exit.
2. If $\textrm{cost}(x_0, \hat{\textbf{U}}) < \textrm{cost}(x_0, \textbf{U})$, then set $\textbf{U} = \hat{\textbf{U}}$, and change the update size to be more aggressive. Go back to step 2.
3. If $\textrm{cost}(x_0, \hat{\textbf{U}}) \geq$ $\textrm{cost}(x_0, \textbf{U})$ change the update size to be more modest. Go back to step 3.

There are a bunch of descriptions of iLQR, and it also goes by names like ‘the sequential linear quadratic algorithm’. The paper that I’m going to be working off of is by Yuval Tassa out of Emo Todorov’s lab, called Control-limited differential dynamic programming. And the Python implementation of this can be found up on my github in my Control repo. Also, a big thank you to Dr. Emo Todorov who provided Matlab code for the iLQG algorithm, which was super helpful.

Defining things

So let’s dive in. Formally defining things, we have our system $\textbf{x}$, and dynamics described with the function $\textbf{f}$, such that

$\textbf{x}_{t+1} = \textbf{f}(\textbf{x}_t, \textbf{u}_t),$

where $\textbf{u}$ is the input control signal. The trajectory $\{\textbf{X}, \textbf{U}\}$ is the sequence of states $\textbf{X} = \{\textbf{x}_0, \textbf{x}_1, ..., \textbf{x}_N\}$ that result from applying the control sequence $\textbf{U} = \{\textbf{u}_0, \textbf{u}_1, ..., \textbf{u}_{N-1}\}$ starting in the initial state $\textbf{x}_0$.

Now we need to define all of our cost related equations, so we know exactly what we’re dealing with.

Define the total cost function $J$, which is the sum of the immediate cost, $\ell$, from each state in the trajectory plus the final cost, $\ell_f$:

$J(\textbf{x}_0, \textbf{U}) = \sum\limits^{N-1}\limits_{t=0} \ell(\textbf{x}_t, \textbf{u}_t) + \ell_f(\textbf{x}_N).$

Letting $\textbf{U}_t = \{\textbf{u}_t, \textbf{u}_{t+1}, ..., \textbf{U}_{N-1}\}$, we define the cost-to-go as the sum of costs from time $t$ to $N$:

$J_t(\textbf{x}, \textbf{U}_t) = \sum\limits^{N-1}\limits_{i=t} \ell(\textbf{x}_i, \textbf{u}_i) + \ell_f(\textbf{x}_N).$

The value function $V$ at time $t$ is the optimal cost-to-go from a given state:

$V_t(\textbf{x}) = \min\limits_{\textbf{U}_t} J_t(\textbf{x}, \textbf{U}_t),$

where the above equation just says that the optimal cost-to-go is found by using the control sequence $\textbf{U}_t$ that minimizes $J_t$.

At the final time step, $N$, the value function is simply

$V(\textbf{x}_N) = \ell_f(\textbf{x}_N).$

For all preceding time steps, we can write the value function as a function of the immediate cost $\ell(\textbf{x}, \textbf{u})$ and the value function at the next time step:

$V(\textbf{x}) = \min\limits_{\textbf{u}} \left[ \ell(\textbf{x}, \textbf{u}) + V(\textbf{f}(\textbf{x}, \textbf{u})) \right].$

NOTE: In the paper, they use the notation $V'(\textbf{f}(\textbf{x}, \textbf{u}))$ to denote the value function at the next time step, which is redundant since $\textbf{x}_{t+1} = \textbf{f}(\textbf{x}_t, \textbf{u}_t)$, but it comes in handy later when they drop the dependencies to simplify notation. So, heads up: $V' = V(\textbf{f}(\textbf{x}, \textbf{u})$.

Forward rollout

The forward rollout consists of two parts. The first part is to simulating things to generate the $(\textbf{X}, \textbf{U}),$ from which we can calculate the overall cost of the trajectory, and find out the path that the arm will take. To improve things though we’ll need a lot of information about the partial derivatives of the system, calculating these is the second part of the forward rollout phase.

To calculate all these partial derivatives we’ll use $(\textbf{X}, \textbf{U})$. For each $(\textbf{x}_t, \textbf{u}_t)$ we’ll calculate the derivatives of $\textbf{f}(\textbf{x}_t, \textbf{u}_t)$ with respect to $\textbf{x}_t$ and $\textbf{u}_t$, which will give us what we need for our linear approximation of the system dynamics.

To get the information we need about the value function, we’ll need the first and second derivatives of $\ell(\textbf{x}_t, \textbf{u}_t)$ and $\ell_f(\textbf{x}_t, \textbf{x}_t)$ with respect to $\textbf{x}_t$ and $\textbf{u}_t$.

So all in all, we need to calculate $\textbf{f}_\textbf{x}$, $\textbf{f}_\textbf{u}$, $\ell_\textbf{x}$, $\ell_\textbf{u}$, $\ell_\textbf{xx}$, $\ell_\textbf{ux}$, $\ell_\textbf{uu}$, where the subscripts denote a partial derivative, so $\ell_\textbf{x}$ is the partial derivative of $\ell$ with respect to $\textbf{x}$, $\ell_\textbf{xx}$ is the second derivative of $\ell$ with respect to $\textbf{x}$, etc. And to calculate all of these partial derivatives, we’re going to use finite differences! Just like in the LQR with finite differences post. Long story short, load up the simulation for every time step, slightly vary one of the parameters, and measure the resulting change.

Once we have all of these, we’re ready to move on to the backward pass.

Backward pass

Now, we started out with an initial trajectory, but that was just a guess. We want our algorithm to take it and then converge to a local minimum. To do this, we’re going to add some perturbing values and use them to minimize the value function. Specifically, we’re going to compute a local solution to our value function using a quadratic Taylor expansion. So let’s define $Q(\delta \textbf{x}, \delta \textbf{u})$ to be the change in our value function at $(\textbf{x}, \textbf{u})$ as a result of small perturbations $(\delta \textbf{x}, \delta \textbf{u})$:

$Q(\delta \textbf{x}, \delta \textbf{u}) = \ell (\textbf{x} + \delta \textbf{x}, \textbf{u} + \delta \textbf{u}) + V(\textbf{f}(\textbf{x} + \delta\textbf{x}, \textbf{u} + \delta \textbf{u})).$

The second-order expansion of $Q$ is given by:

$Q_\textbf{x} = \ell_\textbf{x} + \textbf{f}_\textbf{x}^T V'_\textbf{x},$

$Q_\textbf{u} = \ell_\textbf{u} + \textbf{f}_\textbf{u}^T V'_\textbf{x},$

$Q_\textbf{xx} = \ell_\textbf{xx} + \textbf{f}_\textbf{x}^T V'_\textbf{xx} \textbf{f}_\textbf{x} + V'_\textbf{x} \cdot \textbf{f}_\textbf{xx},$

$Q_\textbf{ux} = \ell_\textbf{ux} + \textbf{f}_\textbf{u}^T V'_\textbf{xx} \textbf{f}_\textbf{x}+ V'_\textbf{x} \cdot \textbf{f}_\textbf{ux},$

$Q_\textbf{uu} = \ell_\textbf{uu} + \textbf{f}_\textbf{u}^T V'_\textbf{xx} \textbf{f}_\textbf{u}+ V'_\textbf{x} \cdot \textbf{f}_\textbf{uu}.$

Remember that $V' = V(\textbf{f}(\textbf{x}, \textbf{u}))$, which is the value function at the next time step. NOTE: All of the second derivatives of $\textbf{f}$ are zero in the systems we’re controlling here, so when we calculate the second derivatives we don’t need to worry about doing any tensor math, yay!

Given the second-order expansion of $Q$, we can to compute the optimal modification to the control signal, $\delta \textbf{u}^*$. This control signal update has two parts, a feedforward term, $\textbf{k}$, and a feedback term $\textbf{K} \delta\textbf{x}$. The optimal update is the $\delta\textbf{u}$ that minimizes the cost of $Q$:

$\delta\textbf{u}^*(\delta \textbf{x}) = \min\limits_{\delta\textbf{u}}Q(\delta\textbf{x}, \delta\textbf{u}) = \textbf{k} + \textbf{K}\delta\textbf{x},$

where $\textbf{k} = -Q^{-1}_\textbf{uu} Q_\textbf{u}$ and $\textbf{K} = -Q^{-1}_\textbf{uu} Q_\textbf{ux}.$

Derivation can be found in this earlier paper by Li and Todorov. By then substituting this policy into the expansion of $Q$ we get a quadratic model of $V$. They do some mathamagics and come out with:

$V_\textbf{x} = Q_\textbf{x} - \textbf{K}^T Q_\textbf{uu} \textbf{k},$

$V_\textbf{xx} = Q_\textbf{xx} - \textbf{K}^T Q_\textbf{uu} \textbf{K}.$

So now we have all of the terms that we need, and they’re defined in terms of the values at the next time step. We know the value of the value function at the final time step $V_N = \ell_f(\textbf{x}_N)$, and so we’ll simply plug this value in and work backwards in time recursively computing the partial derivatives of $Q$ and $V$.

Calculate control signal update

Once those are all calculated, we can calculate the gain matrices, $\textbf{k}$ and $\textbf{K}$, for our control signal update. Huzzah! Now all that’s left to do is evaluate this new trajectory. So we set up our system

$\hat{\textbf{x}}_0 = \textbf{x}_0,$

$\hat{\textbf{u}}_t = \textbf{u}_t + \textbf{k}_t + \textbf{K}_t (\hat{\textbf{x}}_t - \textbf{x}_t),$

$\hat{\textbf{x}}_{t+1} = \textbf{f}(\hat{\textbf{x}}_t, \hat{\textbf{u}}_t),$

and record the cost. Now if the cost of the new trajectory $(\hat{\textbf{X}}, \hat{\textbf{U}})$ is less than the cost of $(\textbf{X}, \textbf{U})$ then we set $\textbf{U} = \hat{\textbf{U}}$ and go do it all again! And when the cost from an update becomes less than a threshold value, call it done. In code this looks like:

if costnew < cost:
sim_new_trajectory = True

if (abs(costnew - cost)/cost) < self.converge_thresh:
break


Of course, another option we need to account for is when costnew > cost. What do we do in this case? Our control update hasn’t worked, do we just exit?

The Levenberg-Marquardt heuristic
No! Phew.

The control signal update in iLQR is calculated in such a way that it can behave like Gauss-Newton optimization (which uses second-order derivative information) or like gradient descent (which only uses first-order derivative information). The is that if the updates are going well, then lets include curvature information in our update to help optimize things faster. If the updates aren’t going well let’s dial back towards gradient descent, stick to first-order derivative information and use smaller steps. This wizardry is known as the Levenberg-Marquardt heuristic. So how does it work?

Something we skimmed over in the iLQR description was that we need to calculate $Q^{-1}_\textbf{uu}$ to get the $\textbf{k}$ and $\textbf{K}$ matrices. Instead of using np.linalg.pinv or somesuch, we’re going to calculate the inverse ourselves after finding the eigenvalues and eigenvectors, so that we can regularize it. This will let us do a couple of things. First, we’ll be able to make sure that our estimate of curvature ($Q_\textbf{uu}^{-1}$) stays positive definite, which is important to make sure that we always have a descent direction. Second, we’re going to add a regularization term to the eigenvalues to prevent them from exploding when we take their inverse. Here’s our regularization implemented in Python:


Q_uu_evals, Q_uu_evecs = np.linalg.eig(Q_uu)
Q_uu_evals[Q_uu_evals < 0] = 0.0
Q_uu_evals += lamb
Q_uu_inv = np.dot(Q_uu_evecs,
np.dot(np.diag(1.0/Q_uu_evals), Q_uu_evecs.T))


Now, what happens when we change lamb? The eigenvalues represent the magnitude of each of the eigenvectors, and by taking their reciprocal we flip the contributions of the vectors. So the ones that were contributing the least now have the largest singular values, and the ones that contributed the most now have the smallest eigenvalues. By adding a regularization term we ensure that the inverted eigenvalues can never be larger than 1/lamb. So essentially we throw out information.

In the case where we’ve got a really good approximation of the system dynamics and value function, we don’t want to do this. We want to use all of the information available because it’s accurate, so make lamb small and get a more accurate inverse. In the case where we have a bad approximation of the dynamics we want to be more conservative, which means not having those large singular values. Smaller singular values give a smaller $Q_\textbf{uu}^{-1}$ estimate, which then gives smaller gain matrices and control signal update, which is what we want to do when our control signal updates are going poorly.

How do you know if they’re going poorly or not, you now surely ask! Clever as always, we’re going to use the result of the previous iteration to update lamb. So adding to the code from just above, the end of our control update loop is going to look like:

lamb = 1.0 # initial value of lambda
...
if costnew < cost:
lamb /= self.lamb_factor
sim_new_trajectory = True

if (abs(costnew - cost)/cost) < self.converge_thresh:
break
else:
lamb *= self.lamb_factor
if lamb > self.max_lamb:
break


And that is pretty much everything! OK let’s see how this runs!

Simulation results

If you want to run this and see for yourself, you can go copy my Control repo, navigate to the main directory, and run

python run.py arm2 reach


or substitute in arm3. If you’re having trouble getting the arm2 simulation to run, try arm2_python, which is a straight Python implementation of the arm dynamics, and should work no sweat for Windows and Mac.

Below you can see results from the iLQR controller controlling the 2 and 3 link arms (click on the figures to see full sized versions, they got distorted a bit in the shrinking to fit on the page), using immediate and final state cost functions defined as:

l = np.sum(u**2)


and

pos_err = np.array([self.arm.x[0] - self.target[0],
self.arm.x[1] - self.target[1]])
l = (wp * np.sum(pos_err**2) + # pos error
wv * np.sum(x[self.arm.DOF:self.arm.DOF*2]**2)) # vel error


where wp and wv are just gain values, x is the state of the system, and self.arm.x is the $(x,y)$ position of the hand. These read as “during movement, penalize large control signals, and at the final state, have a big penalty on not being at the target.”

So let’s give it up for iLQR, this is awesome! How much of a crazy improvement is that over LQR? And with all knowledge of the system through finite differences, and with the full movements in exactly 1 second! (Note: The simulation speeds look different because of my editing to keep the gif sizes small, they both take the same amount of time for each movement.)

Changing cost functions
Something that you may notice is that the control of the 3 link is actually straighter than the 2 link. I thought that this might be just an issue with the gain values, since the scale of movement is smaller for the 2 link arm than the 3 link there might have been less of a penalty for not moving in a straight line, BUT this was wrong. You can crank the gains and still get the same movement. The actual reason is that this is what the cost function specifies, if you look in the code, only $\ell_f(\textbf{x}_N)$ penalizes the distance from the target, and the cost function during movement is strictly to minimize the control signal, i.e. $\ell(\textbf{x}_t, \textbf{u}_t) = \textbf{u}_t^2$.

Well that’s a lot of talk, you say, like the incorrigible antagonist we both know you to be, prove it. Alright, fine! Here’s iLQR running with an updated cost function that includes the end-effector’s distance from the target in the immediate cost:

All that I had to do to get this was change the immediate cost from

l = np.sum(u**2)


to

l = np.sum(u**2)
pos_err = np.array([self.arm.x[0] - self.target[0],
self.arm.x[1] - self.target[1]])
l += (wp * np.sum(pos_err**2) + # pos error
wv * np.sum(x[self.arm.DOF:self.arm.DOF*2]**2)) # vel error


where all I had to do was include the position penalty term from the final state cost into the immediate state cost.

Changing sequence length
In these simulations the system is simulating at .01 time step, and I gave it 100 time steps to reach the target. What if I give it only 50 time steps?

It looks pretty much the same! It’s just now twice as fast, which is of course achieved by using larger control signals, which we don’t see, but dang awesome.

What if we try to make it there in 10 time steps??

OK well that does not look good. So what’s going on in this case? Basically we’ve given the algorithm an impossible task. It can’t make it to the target location in 10 time steps. In the implementation I wrote here, if it hits the end of it’s control sequence and it hasn’t reached the target yet, the control sequence starts over back at t=0. Remember that part of the target state is also velocity, so basically it moves for 10 time steps to try to minimize $(x,y)$ distance, and then slows down to minimize final state cost in the velocity term.

In conclusion

This algorithm has been used in a ton of things, for controlling robots and simulations, and is an important part of guided policy search, which has been used to very successfully train deep networks in control problems. It’s getting really impressive results for controlling the arm models that I’ve built here, and using finite differences should easily generalize to other systems.

iLQR is very computationally expensive, though, so that’s definitely a downside. It’s definitely less expensive if you have the equations of your system, or at least a decent approximation of them, and you don’t need to use finite differences. But you pay for the efficiency with a loss in generality.

There are also a bunch of parameters to play around with that I haven’t explored at all here, like the weights in the cost function penalizing the magnitude of the cost function and the final state position error. I showed a basic example of changing the cost function, which hopefully gets across just how easy changing these things out can be when you’re using finite differences, and there’s a lot to play around with there too.

Implementation note
In the Yuval and Todorov paper, they talked about using backtracking line search when generating the control signal. So the algorithm they had when generating the new control signal was actually:

$\hat{\textbf{u}}_t = \hat{\textbf{u}}_t + \alpha\textbf{k}_t + \textbf{K}_t(\hat{\textbf{x}}_t - \textbf{x}_t)$

where $\alpha$ was the backtracking search parameter, which gets set to one initially and then reduced. It’s very possible I didn’t implement it as intended, but I found consistently that $\alpha = 1$ always generated the best results, so it was just adding computation time. So I left it out of my implementation. If anyone has insights on an implementation that improves results, please let me know!

And then finally, another thank you to Dr. Emo Todorov for providing Matlab code for the iLQG algorithm, which was very helpful, especially for getting the Levenberg-Marquardt heuristic implemented properly.

## Operational space control of 6DOF robot arm with spiking cameras part 3: Tracking a target using spiking cameras

Alright. Previously we got our arm all set up to perform operational space control, accepting commands through Python. In this post we’re going to set it up with a set of spiking cameras for eyes, train it to learn the mapping between camera coordinates and end-effector coordinates, and have it track an LED target.

What is a spiking camera?

Good question! Spiking cameras are awesome, and they come from Dr. Jorg Conradt’s lab. Basically what they do is return you information about movement from the environment. They’re event-driven, instead of clock-driven like most hardware, which means that they have no internal clock that’s dictating when they send information (i.e. they’re asynchronous). They send information out as soon as they receive it. Additionally, they only send out information about the part of the image that has changed. This means that they have super fast response times and their output bandwidth is really low. Dr. Terry Stewart of our lab has written a bunch of code that can be used for interfacing with spiking cameras, which can all be found up on his GitHub.

Let’s use his code to see through a spiking camera’s eye. After cloning his repo and running python setup.py you can plug in a spiking camera through USB, and with the following code have a Matplotlib figure pop-up with the camera output:

import nstbot
import nstbot.connection
import time

eye = nstbot.RetinaBot()
eye.connect(nstbot.connection.Serial('/dev/ttyUSB0', baud=4000000))

time.sleep(1)

eye.retina(True)
eye.show_image()

while True:
time.sleep(1)


The important parts here are the creation of an instance of the RetinaBot, connecting it to the proper USB port, and calling the show_image function. Pretty easy, right? Here’s some example output, this is me waving my hand and snapping my fingers:

How cool is that? Now, you may be wondering how or why we’re going to use a spiking camera instead of a regular camera. The main reason that I’m using it here is because it makes tracking targets super easy. We just set up an LED that blinks at say 100Hz, and then we look for that frequency in the spiking camera output by recording the rate of change of each of the pixels and averaging over all pixel locations changing at the target frequency. So, to do this with the above code we simply add

eye.track_frequencies(freqs=[100])


And now we can track the location of an LED blinking at 100Hz! The visualization code place a blue dot at the estimated target location, and this all looks like:

Alright! Easily decoded target location complete.

Transforming between camera coordinates and end-effector coordinates

Now that we have a system that can track a target location, we need to transform that position information into end-effector coordinates for the arm to move to. There are a few ways to go about this. One is by very carefully positioning the camera and measuring the distances between the robot’s origin reference frame and working through the trig etc etc. Another, much less pain-in-the-neck way is to instead record some sample points of the robot end-effector at different positions in both end-effector and camera coordinates, and then use a function approximator to generalize over the rest of space.

We’ll do the latter, because it’s exactly the kind of thing that neurons are great for. We have some weird function, and we want to learn to approximate it. Populations of neurons are awesome function approximators. Think of all the crazy mappings your brain learns. To perform function approximation with neurons we’re going to use the Neural Engineering Framework (NEF). If you’re not familiar with the NEF, the basic idea is to using the response curves of neurons as a big set of basis function to decode some signal in some vector space. So we look at the responses of the neurons in the population as we vary our input signal, and then determine a set of decoders (using least-squares or somesuch) that specify the contribution of each neuron to the different dimensions of the function we want to approximate.

Here’s how this is going to work.

1. We’re going to attach the LED to the head of the robot,
2. we specify a set of $(x,y,z)$ coordinates that we send to the robot’s controller,
3. when the robot moves to each point, record the LED location from the camera as well as the end-effector’s $(x,y,z)$ coordinate,
4. create a population of neurons that we train up to learn the mapping from camera locations to end-effector $(x,y,z)$ locations
5. use this information to tell the robot where to move.

A detail that should be mentioned here is that a single camera only provides 2D output. To get a 3D location we’re going to use two separate cameras. One will provide $(x,z)$ information, and the other will provide $(y,z)$ information.

Once we’ve taped (expertly) the LED onto the robot arm, the following script to generate the information we to approximate the function transforming from camera to end-effector space:

import robot
from eye import Eye # this is just a spiking camera wrapper class

import numpy as np
import time

# connect to the spiking cameras
eye0 = Eye(port='/dev/ttyUSB2')
eye1 = Eye(port='/dev/ttyUSB1')
eyes = [eye0, eye1]
# connect to the robot
rob = robot.robotArm()

# define the range of values to test
min_x = -10.0
max_x = 10.0
x_interval = 5.0
min_y = -15.0
max_y = -5.0
y_interval = 5.0
min_z = 10.0
max_z = 20.0
z_interval = 5.0

x_space = np.arange(min_x, max_x, x_interval)
y_space = np.arange(min_y, max_y, y_interval)
z_space = np.arange(min_z, max_z, z_interval)

num_samples = 10 # how many camera samples to average over

try:
out_file0 = open('eye_map_0.csv', 'w')
out_file1 = open('eye_map_1.csv', 'w')

for i, x_val in enumerate(x_space):
for j, y_val in enumerate(y_space):
for k, z_val in enumerate(z_space):

rob.move_to_xyz(target)
time.sleep(2) # time for the robot to move

# take a bunch of samples and average the input to get
# the approximation of the LED in camera coordinates
eye_data0 = np.zeros(2)
for k in range(num_samples):
eye_data0 += eye0.position(0)[:2]
eye_data0 /= num_samples
out_file0.write('%0.2f, %0.2f, %0.2f, %0.2f\n' %
(y_val, z_val, eye_data0[0], eye_data0[1]))

eye_data1 = np.zeros(2)
for k in range(num_samples):
eye_data1 += eye1.position(0)[:2]
eye_data1 /= num_samples
out_file1.write('%0.2f, %0.2f, %0.2f, %0.2f\n' %
(x_val, z_val, eye_data1[0], eye_data1[1]))

out_file0.close()
out_file1.close()
except:
import sys
import traceback
print traceback.print_exc(file=sys.stdout)
finally:
rob.robot.disconnect()


This script connects to the cameras, defines some rectangle in end-effector space to sample, and then works through each of the points writing the data to file. The results of this code can be seen in the animation posted in part 2 of this series.

OK! So now we have all the information we need to train up our neural population. It’s worth noting that we’re only using 36 sample points to train up our neurons, I did this mostly because I didn’t want to wait around. You can of course use more, though, and the more sample points you have the more accurate your function approximation will be.

Implementing a controller using Nengo

The neural simulation software (which implements the NEF) that we’re going to be using to generate and train our neural population is called Nengo. It’s free to use for non-commercial use, and I highly recommend checking out the introduction and tutorials if you have any interest in neural modeling.

What we need to do now is generate two neural populations, one for each camera, that will receives input from the spiking camera and transform the target’s location information into end-effector coordinates. We will then combine the estimates from the two populations, and send that information out to the robot to tell it where to move. I’ll paste the code in here, and then we’ll step through it below.

from eye import Eye
import nengo
from nengo.utils.connection import target_function
import robot

import numpy as np
import sys
import traceback

# connect to robot
rob = robot.robotArm()

model = nengo.Network()

try:
def eyeNet(port='/dev/ttyUSB0', filename='eye_map.csv', n_neurons=1000,
label='eye'):

# connect to eye
spiking_cam = Eye(port=port)

# read in eval points and target output
eval_points = []
targets = []

file_obj = open(filename, 'r')
for line in file_data:
line_data = map(float, line.strip().split(','))
targets.append(line_data[:2])
eval_points.append(line_data[2:])
file_obj.close()

eval_points = np.array(eval_points)
targets = np.array(targets)

# create subnetwork for eye
net = nengo.Network(label=label)
with net:
def eye_input(t):
return spiking_cam.position(0)[:2]
net.input = nengo.Node(output=eye_input, size_out=2)
net.map_ens = nengo.Ensemble(n_neurons, dimensions=2)
net.output = nengo.Node(size_in=2)

nengo.Connection(net.input, net.map_ens, synapse=None)
nengo.Connection(net.map_ens, net.output, synapse=None,
**target_function(eval_points, targets))

return net

with model:
# create network for spiking camera 0
eye0 = eyeNet(port='/dev/ttyUSB2', filename='eye_map_0.csv', label='eye0')
# create network for spiking camera 1
eye1 = eyeNet(port='/dev/ttyUSB1', filename='eye_map_1.csv', label='eye1')

def eyes_func(t, yzxz):
x = yzxz[2] # x coordinate coded from eye1
y = yzxz[0] # y coordinate coded from eye0
z = (yzxz[1] + yzxz[3]) / 2.0 # z coordinate average from eye0 and eye1
return [x,y,z]
eyes = nengo.Node(output=eyes_func, size_in=4)
nengo.Connection(eye0.output, eyes[:2])
nengo.Connection(eye1.output, eyes[2:])

# create output node for sending instructions to arm
def arm_func(t, x):
if t < .05: return # don't move arm during startup (avoid transients)
rob.move_to_xyz(np.array(x, dtype='float32'))
armNode = nengo.Node(output=arm_func, size_in=3, size_out=0)
nengo.Connection(eyes, armNode)

sim = nengo.Simulator(model)
sim.run(10, progress_bar=False)

except:
print traceback.print_exc(file=sys.stdout)
finally:
print 'disconnecting'
rob.robot.disconnect()


The first thing we’re doing is defining a function (eyeNet) to create our neural population that takes input from a spiking camera, and decodes out an end-effector location. In here, we read in from the file the information we just recorded about the camera positions that will serve as the input signal to the neurons (eval_points) and the corresponding set of function output (targets). We create a Nengo network, net, and then a couple of nodes for connecting the input (net.input) and projecting the output (net.output). The population of neurons that we’ll use to approximate our function is called net.map_ens. To specify the function we want to approximate using the eval_points and targets arrays, we create a connection from net.map_ens to net.output and use **target_function(eval_points, targets). So this is probably a little weird to parse if you haven’t used Nengo before, but hopefully it’s clear enough that you can get the gist of what’s going on.

In the main part of the code, we create another Nengo network. We call this one model because that’s convention for the top-level network in Nengo. We then create two networks using the eyeNet function to hook up to the two cameras. At this point we create a node called eyes, and the role of this node is simply to amalgamate the information from the two cameras from $(x,z)$ and $(y,z)$ into $(x,y,z)$. This node is then hooked up to another node called armNode, and all armNode does is call the robot arm’s move_to_xyz function, which we defined in the last post.

Finally, we create a Simulation from model, which compiles the neural network we just specified above, and we run the simulation. The result of all of this then looks something like the following:

And there we go! Project complete! We have a controller for a 6DOF arm that uses spiking cameras to train up a neural population and track an LED, that requires almost no set up time. I gave a demo of this at the end of the summer school and there’s no real positioning of the cameras relative to the arm required, just have to tape the cameras up somewhere, run the training script, and go!

Future work

From here there are a bunch of fun ways to go about extending this. We could add another LED blinking at a different frequency that the arm needs to avoid, using an obstacle avoidance algorithm like the one in this post, add in another dimension of the task involving the gripper, implement a null-space controller to keep the arm near resting joint angles as it tracks the target, and on and on!

Another thing that I’ve looked at is including learning on the system to fine tune our function approximation online. As is, the controller is able to extrapolate and move the arm to target locations that are outside of the range of space sampled during training, but it’s not super accurate. It would be much better to be constantly refining the estimate using learning. I was able to implement a basic version that works, but getting the learning and the tracking running at the same time turns out to be a bit trickier, so I haven’t had the chance to get it all running yet. Hopefully there will be some more down-time in the near future, however, and be able to finish implementing it.

For now, though, we still have a pretty neat target tracker for our robot arm!

## Operational space control of 6DOF robot arm with spiking cameras part 2: Deriving the Jacobian

In the previous exciting post in this series I outlined the project, which is in the title, and we worked through getting access to the arm through Python. The next step was deriving the Jacobian, and that’s what we’re going to be talking about in this post!

Alright.
This was a time I was very glad to have a previous post talking about generating transformation matrices, because deriving the Jacobian for a 6DOF arm in 3D space comes off as a little daunting when you’re used to 3DOF in 2D space, and I needed a reminder of the derivation process. The first step here was finding out which motors were what, so I went through and found out how each motor moved with something like the following code:

for ii in range(7):
target_angles = np.zeros(7, dtype='float32')
target_angles[ii] = np.pi / 4.0
rob.move(target_angles)
time.sleep(1)


and I found that the robot is setup in the figures below

this is me trying my hand at making things clearer using Inkscape, hopefully it’s worked. Displayed are the first 6 joints and their angles of rotation, $q_0$ through $q_5$. The 7th joint, $q_6$, opens and closes the gripper, so we’re safe to ignore it in deriving our Jacobian. The arm segment lengths $l_1, l_3,$ and $l_5$ are named based on the nearest joint angles (makes easier reading in the Jacobian derivation).

Find the transformation matrix from end-effector to origin

So first thing’s first, let’s find the transformation matrices. Our first joint, $q_0$, rotates around the $z$ axis, so the rotational part of our transformation matrix $^0_\textrm{org}\textbf{T}$ is

$^0_\textrm{org}\textbf{R} = \left[ \begin{array}{ccc} \textrm{cos}(q_0) & -\textrm{sin}(q_0) & 0 \\ \textrm{sin}(q_0) & \textrm{cos}(q_0) & 0 \\ 0 & 0 & 1 \end{array} \right],$

and $q_0$ and our origin frame of reference are on top of each other so we don’t need to account for translation, so our translation component of $^0_\textrm{org}\textbf{T}$ is

$^0_\textrm{org}\textbf{D} = \left[ \begin{array}{c} 0 \\ 0 \\ 0 \end{array} \right].$

Stacking these together to form our first transformation matrix we have

$^0_\textrm{org}\textbf{T} = \left[ \begin{array}{cc} ^0_\textrm{org}\textbf{R} & ^0_\textrm{org}\textbf{D} \\ 0 & 1 \end{array} \right] = \left[ \begin{array}{cccc} \textrm{cos}(q_0) & -\textrm{sin}(q_0) & 0 & 0\\ \textrm{sin}(q_0) & \textrm{cos}(q_0) & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array} \right] .$

So now we are able to convert a position in 3D space from to the reference frame of joint $q_0$ back to our origin frame of reference. Let’s keep going.

Joint $q_1$ rotates around the $x$ axis, and there is a translation along the arm segment $l_1$. Our transformation matrix looks like

$^1_0\textbf{T} = \left[ \begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & \textrm{cos}(q_1) & -\textrm{sin}(q_1) & l_1 \textrm{cos}(q_1) \\ 0 & \textrm{sin}(q_1) & \textrm{cos}(q_1) & l_1 \textrm{sin}(q_1) \\ 0 & 0 & 0 & 1 \end{array} \right] .$

Joint $q_2$ also rotates around the $x$ axis, but there is no translation from $q_2$ to $q_3$. So our transformation matrix looks like

$^2_1\textbf{T} = \left[ \begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & \textrm{cos}(q_2) & -\textrm{sin}(q_2) & 0 \\ 0 & \textrm{sin}(q_2) & \textrm{cos}(q_2) & 0 \\ 0 & 0 & 0 & 1 \end{array} \right] .$

The next transformation matrix is a little tricky, because you might be tempted to say that it’s rotating around the $z$ axis, but actually it’s rotating around the $y$ axis. This is determined by where $q_3$ is mounted relative to $q_2$. If it was mounted at 90 degrees from $q_2$ then it would be rotating around the $z$ axis, but it’s not. For translation, there’s a translation along the $y$ axis up to the next joint, so all in all the transformation matrix looks like:

$^3_2\textbf{T} = \left[ \begin{array}{cccc} \textrm{cos}(q_3) & 0 & -\textrm{sin}(q_3) & 0\\ 0 & 1 & 0 & l_3 \\ \textrm{sin}(q_3) & 0 & \textrm{cos}(q_3) & 0\\ 0 & 0 & 0 & 1 \end{array} \right] .$

And then the transformation matrices for coming from $q_4$ to $q_3$ and $q_5$ to $q_4$ are the same as the previous set, so we have

$^4_3\textbf{T} = \left[ \begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & \textrm{cos}(q_4) & -\textrm{sin}(q_4) & 0 \\ 0 & \textrm{sin}(q_4) & \textrm{cos}(q_4) & 0 \\ 0 & 0 & 0 & 1 \end{array} \right] .$

and

$^5_4\textbf{T} = \left[ \begin{array}{cccc} \textrm{cos}(q_5) & 0 & -\textrm{sin}(q_5) & 0 \\ 0 & 1 & 0 & l_5 \\ \textrm{sin}(q_5) & 0 & \textrm{cos}(q_5) & 0\\ 0 & 0 & 0 & 1 \end{array} \right] .$

Alright! Now that we have all of the transformation matrices, we can put them together to get the transformation from end-effector coordinates to our reference frame coordinates!

$^\textrm{ee}_\textrm{org}\textbf{T} = ^0_\textrm{org}\textbf{T} \; ^1_0\textbf{T} \; ^2_1\textbf{T} \; ^3_2\textbf{T} \; ^4_3\textbf{T} \; ^5_4\textbf{T}.$

At this point I went and tested this with some sample points to make sure that everything seemed to be being transformed properly, but we won’t go through that here.

Calculate the derivative of the transform with respect to each joint

The next step in calculating the Jacobian is getting the derivative of $^\textrm{ee}_\textrm{org}T$. This could be a big ol’ headache to do it by hand, OR we could use SymPy, the symbolic computation package for Python. Which is exactly what we’ll do. So after a quick

sudo pip install sympy


I wrote up the following script to perform the derivation for us

import sympy as sp

def calc_transform():
# set up our joint angle symbols (6th angle doesn't affect any kinematics)
q = [sp.Symbol('q0'), sp.Symbol('q1'), sp.Symbol('q2'), sp.Symbol('q3'),
sp.Symbol('q4'), sp.Symbol('q5')]
# set up our arm segment length symbols
l1 = sp.Symbol('l1')
l3 = sp.Symbol('l3')
l5 = sp.Symbol('l5')

Torg0 = sp.Matrix([[sp.cos(q[0]), -sp.sin(q[0]), 0, 0,],
[sp.sin(q[0]), sp.cos(q[0]), 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]])

T01 = sp.Matrix([[1, 0, 0, 0],
[0, sp.cos(q[1]), -sp.sin(q[1]), l1*sp.cos(q[1])],
[0, sp.sin(q[1]), sp.cos(q[1]), l1*sp.sin(q[1])],
[0, 0, 0, 1]])

T12 = sp.Matrix([[1, 0, 0, 0],
[0, sp.cos(q[2]), -sp.sin(q[2]), 0],
[0, sp.sin(q[2]), sp.cos(q[2]), 0],
[0, 0, 0, 1]])

T23 = sp.Matrix([[sp.cos(q[3]), 0, sp.sin(q[3]), 0],
[0, 1, 0, l3],
[-sp.sin(q[3]), 0, sp.cos(q[3]), 0],
[0, 0, 0, 1]])

T34 = sp.Matrix([[1, 0, 0, 0],
[0, sp.cos(q[4]), -sp.sin(q[4]), 0],
[0, sp.sin(q[4]), sp.cos(q[4]), 0],
[0, 0, 0, 1]])

T45 = sp.Matrix([[sp.cos(q[5]), 0, sp.sin(q[5]), 0],
[0, 1, 0, l5],
[-sp.sin(q[5]), 0, sp.cos(q[5]), 0],
[0, 0, 0, 1]])

T = Torg0 * T01 * T12 * T23 * T34 * T45

# position of the end-effector relative to joint axes 6 (right at the origin)
x = sp.Matrix([0,0,0,1])

Tx = T * x

for ii in range(6):
print q[ii]
print sp.simplify(Tx[0].diff(q[ii]))
print sp.simplify(Tx[1].diff(q[ii]))
print sp.simplify(Tx[2].diff(q[ii]))


And then consolidated the output using some variable shorthand to write a function that accepts in joint angles and generates the Jacobian:

def calc_jacobian(q):
J = np.zeros((3, 7))

c0 = np.cos(q[0])
s0 = np.sin(q[0])
c1 = np.cos(q[1])
s1 = np.sin(q[1])
c3 = np.cos(q[3])
s3 = np.sin(q[3])
c4 = np.cos(q[4])
s4 = np.sin(q[4])

c12 = np.cos(q[1] + q[2])
s12 = np.sin(q[1] + q[2])

l1 = self.l1
l3 = self.l3
l5 = self.l5

J[0,0] = -l1*c0*c1 - l3*c0*c12 - l5*((s0*s3 - s12*c0*c3)*s4 + c0*c4*c12)
J[1,0] = -l1*s0*c1 - l3*s0*c12 + l5*((s0*s12*c3 + s3*c0)*s4 - s0*c4*c12)
J[2,0] = 0

J[0,1] = (l1*s1 + l3*s12 + l5*(s4*c3*c12 + s12*c4))*s0
J[1,1] = -(l1*s1 + l3*s12 + l5*s4*c3*c12 + l5*s12*c4)*c0
J[2,1] = l1*c1 + l3*c12 - l5*(s4*s12*c3 - c4*c12)

J[0,2] = (l3*s12 + l5*(s4*c3*c12 + s12*c4))*s0
J[1,2] = -(l3*s12 + l5*s4*c3*c12 + l5*s12*c4)*c0
J[2,2] = l3*c12 - l5*(s4*s12*c3 - c4*c12)

J[0,3] = -l5*(s0*s3*s12 - c0*c3)*s4
J[1,3] = l5*(s0*c3 + s3*s12*c0)*s4
J[2,3] = -l5*s3*s4*c12

J[0,4] = l5*((s0*s12*c3 + s3*c0)*c4 + s0*s4*c12)
J[1,4] = l5*((s0*s3 - s12*c0*c3)*c4 - s4*c0*c12)
J[2,4] = -l5*(s4*s12 - c3*c4*c12)

return J


Alright! Now we have our Jacobian! Really the only time consuming part here was calculating our end-effector to origin transformation matrix, generating the Jacobian was super easy using SymPy once we had that.

Hack position control using the Jacobian

Great! So now that we have our Jacobian we’ll be able to translate forces that we want to apply to the end-effector into joint torques that we want to apply to the arm motors. Since we can’t control applied force to the motors though, and have to pass in desired angle positions, we’re going to do a hack approximation. Let’s first transform our forces from end-effector space into a set of joint angle torques:

$\textbf{u} = \textbf{J}^T \; \textbf{u}_\textbf{x}.$

To approximate the control then we’re simply going to take the current set of joint angles (which we know because it’s whatever angles we last told the system to move to) and add a scaled down version of $\textbf{u}$ to approximate applying torque that affects acceleration and then velocity.

$\textbf{q}_\textrm{des} = \textbf{q} + \alpha \; \textbf{u},$

where $\alpha$ is the gain term, I used .001 here because it was nice and slow, so no crazy commands that could break the servos would be sent out before I could react and hit the cancel button.

What we want to do then to implement operational space control here then is find the current $(x,y,z)$ position of the end-effector, calculate the difference between it and the target end-effector position, use that to generate the end-effector control signal $u_x$, get the Jacobian for the current state of the arm using the function above, find the set of joint torques to apply, approximate this control by generating a set of target joint angles to move to, and then repeat this whole loop until we’re within some threshold of the target position. Whew.

So, a lot of steps, but pretty straight forward to implement. The method I wrote to do it looks something like:

def move_to_xyz(self, xyz_d):
"""
np.array xyz_d: 3D target (x_d, y_d, z_d)
"""
count = 0
while (1):
count += 1
# get control signal in 3D space
xyz = self.calc_xyz()
delta_xyz = xyz_d - xyz
ux = self.kp * delta_xyz

# transform to joint space
J = self.calc_jacobian()
u = np.dot(J.T, ux)

# target joint angles are current + uq (scaled)
self.q[...] += u * .001
self.robot.move(np.asarray(self.q.copy(), 'float32'))

if np.sqrt(np.sum(delta_xyz**2)) < .1 or count > 1e4:
break


And that is it! We have successfully hacked together a system that can perform operational space control of a 6DOF robot arm. Here is a very choppy video of it moving around to some target points in a grid on a cube.

So, granted I had to drop a lot of frames from the video to bring it’s size down to something close to reasonable, but still you can see that it moves to target locations super fast!

Alright this is sweet, but we’re not done yet. We don’t want to have to tell the arm where to move ourselves. Instead we’d like the robot to perform target tracking for some target LED we’re moving around, because that’s way more fun and interactive. To do this, we’re going to use spiking cameras! So stay tuned, we’ll talk about what the hell spiking cameras are and how to use them for a super quick-to-setup and foolproof target tracking system in the next exciting post!

## Operational space control of 6DOF robot arm with spiking cameras part 1: Getting access to the arm through Python

From June 9th to the 19th we ran a summer school (brain camp) in our lab with people from all over to come and learn how to use our neural modeling software, Nengo, and then work on a project with others in the school and people from the lab. We had a bunch of fun hardware available, from neuromorphic hardware from the SpiNNaker group over at Cambridge to lego robots on omni-wheels to spiking cameras (i’ll discuss what a spiking camera is exactly in part 3 of this series) and little robot arms. There were a bunch of awesome projects, people built neural models capable of performing a simplified version of the Wisconsin card sorting task that they then got running on the SpiNNaker boards, a neural controller built to move a robot leech, a spiking neurons reinforcement-learning system implemented on SpiNNaker with spiking cameras to control the lego robot that learned to move towards green LEDs and avoid red LEDs, and a bunch of others. If you’re interested in participating in these kinds of projects and learning more about this I highly suggest you apply to the summer school for next year!

I took the summer school as a chance to break from other projects and hack together a project. The idea was to take the robot arm, implement an operational space controller (i.e. find the Jacobian), and then used spiking cameras to track an LED and have the robot arm learn how to move to the target, no matter where the cameras were placed, by learning the relationship between where the target is in camera-centric coordinates and arm-centric coordinates. This broke down into several steps: 1) Get access to the arm through Python, 2) derive the Jacobian to implement operational space control, 3) sample state space to get an approximation of the camera-centric to arm-centric function, 4) implement the control system to track the target LED.

Here’s a picture of the set-up during step 3, training.

So in the center is the 6DOF robot arm with a little LED attached to the head, and highlighted in orange circles are the two spiking cameras, expertly taped to the wall with office-grade scotch tape. You can also see the SpiNNaker board in the top left as a bonus, but I didn’t have enough time to involve it in this project.

I was originally going to write this all up as one post, because the first two parts are re-implementing previous posts, but even skimming through those steps it was getting long and I’m sure no one minds having a few different examples to look through for generating Cython code or deriving a Jacobian. So I’m going to break this into a few parts. Here in this post we’ll work through the first step (Cython) of our journey.

The arm we had was the VE026A Denso training robot, on loan from Dr. Bryan Tripp of the neuromorphic robotics lab at UW. Previously an interface had been built up by one of Dr. Tripp’s summer students, written in C. C is great and all but Python is much easier to work with, and the rest of the software I wanted to work you know what I’m done justifying it Python is just great so Python is what I wanted to use. The end.

So to get access to the arm in Python I used the great ol’ C++ wrapper hack described in a previous post. Looking at Murphy-the-summer-student’s C code there were basically three functions I needed access to:

// initialize threads, connect to robot
void start_robot(int *semid, int32_t *ctrlid, int32_t *robotid, float **angles)
// send a set of joint angles to the robot servos
void Robot_Execute_slvMove(int32_t robotid, float j1, float j2, float j3, float j4, float j5, float j6, float j7, float j8)
// kill threads, disconnect from robot
void stop_robot(int semid, int32_t ctrlid, int32_t robotid)


So I changed the extension of the file to ‘.cpp’ (I know, I know, I said this was a hack!), fixed some compiler errors that popped up, and then appended the following to the end of the file:

/* A class to contain all the information that needs to
be passed around between these functions, and can
encapsulate it and hide it from the Python interface.

Written by Travis DeWolf (June, 2015)
*/

class Robot {
/* Very simple class, just stores the variables
* needed for talking to the robot, and a gives access
* to the functions for moving it around */

int semid;
int32_t ctrlid;
int32_t robotid;
float* angles;

public:
Robot();
~Robot();
void connect();
void move(float* angles_d);
void disconnect();
};

Robot::Robot() { }

Robot::~Robot() { free(angles); }

/* Connect to the robot, get the ids and current joint angles */
/* char* usb_port: the name of the port the robot is connected to */
void Robot::connect()
{
start_robot(&amp;semid, &amp;ctrlid, &amp;robotid, &amp;angles);
printf("%i %i %i", robotid, ctrlid, semid);
}

/* Move the robot to the specified angles */
/* float* angles: the target joint angles */
void Robot::move(float* angles)
{
// convert from radians to degrees
Robot_Execute_slvMove(robotid,
angles[0] * 180.0 / 3.14,
angles[1] * 180.0 / 3.14,
angles[2] * 180.0 / 3.14,
angles[3] * 180.0 / 3.14,
angles[4] * 180.0 / 3.14,
angles[5] * 180.0 / 3.14,
angles[6] * 180.0 / 3.14,
angles[7] * 180.0 / 3.14);
}

/* Disconnect from the robot */
void Robot::disconnect()
{
stop_robot(semid, ctrlid, robotid);
}

int main()
{
Robot robot = Robot();
// connect to robot
robot.connect();

// move robot to some random target angles
float target_angles[7] = {0, np.pi / 2.0, 0.0, 0, 0, 0, 0};
robot.move(target_angles);

sleep(1);

// disconnect
robot.disconnect();

return 0;
}


So very simple class. Basically just wanted to create a set of functions to hide some of the unnecessary parameters from Python, do the conversion from radians to degrees (who works in degrees? honestly), and have a short little main function to test the creation of the class, and connection, movement, and disconnection of the robot. Like I said, there were a few compiler errors when switching from C to C++, but really that was the only thing that took any time on this part. A few casts and everything was gravy.

The next part was writing the Cython pyRobot.pyx file (I describe the steps involved in this in more detail in this post):

import numpy as np
cimport numpy as np

cdef extern from "bcap.cpp":
cdef cppclass Robot:
Robot()
void connect()
void move(float* angles)
void disconnect()

cdef class pyRobot:
cdef Robot* thisptr

def __cinit__(self):
self.thisptr = new Robot()

def __dealloc__(self):
del self.thisptr

def connect(self):
"""
Set up the connection to the robot, pass in a vector,
get back the current joint angles of the arm.
param np.ndarray angles: a vector to store current joint angles
"""
self.thisptr.connect()

def move(self, np.ndarray[float, mode="c"] angles):
"""
Step the simulation forward one timestep. Pass in target angles,
get back the current arm joint angles.
param np.ndarray angles: 7D target angle vector
"""
self.thisptr.move(&amp;angles[0])

def disconnect(self):
"""
Disconnect from the robot.
"""
self.thisptr.disconnect()


the setup.py file:

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

setup(
name = 'Demos',
ext_modules=[
Extension("test",
sources=["pyRobot.pyx"],
language="c++"),
],
cmdclass = {'build_ext': build_ext},

)


and then compiling!

run setup.py build_ext -i


With all of this working, a nice test.so file was created, and it was now possible to connect to the robot in Python with

import test
rob = test.pyRobot()
rob.connect()
target_angles = np.array([0, np.pi/2, 0, np.pi/4, 0, 0, 0, 0], dtype='float32')
rob.move(target_angles)
import time
time.sleep(1)
rob.disconnect()


In the above code we’re instantiating the pyRobot class, connecting to the robot, defining a set of target angles and telling the robot to move there, waiting for 1 second to give the robot time to actually move, and then disconnecting from the robot. Upon connection we have to pass in a set of joint angles for the servos, and so you see the robot arm jerk into position, and then move to the target set of joint angles, it looks something exactly like this:

Neat, phase 1 complete.

At the end of phase 1 we are able to connect to the robot arm through Python and send commands in terms of joint angles. But we don’t want to send commands in terms of joint angles, we want to just specify the end-effector position and have the robot work out the angles! I’ve implemented an inverse kinematics solver using constrained optimization before for a 3-link planar arm, but we’re not going to go that route. Find out what we’ll do by joining us next time! or by remembering what I said we’d do at the beginning of this post.

Tagged , , ,

## Dynamic movement primitives part 2: Controlling end-effector trajectories

The dynamic movement primitive (DMP) framework was designed for trajectory control. It so happens that in previous posts we’ve built up to having several arm simulations that are ripe for throwing a trajectory controller on top, and that’s what we’ll do in this post. The system that we will be controlling here is the 3 link arm model with an operational space controller (OSC) that translates end-effector forces into joint torques. The DMPs here will be controlling the $(x,y)$ trajectory of the hand, and the OSC will take care of turning the desired hand forces into torques that can be applied to the arm. All of the code used to generate the animations throughout this post can of course be found up on my github (to run play around with variants of python run.py arm3 dmp write).

Controlling a 3 link arm with DMPs
We have our 3 link arm and its OSC controller; this whole setup we’ll collectively refer to as ‘the plant’ throughout this post. We are going to pass in some $(x,y)$ force signal to the plant, and the plant will carry it out. Additionally we’ll get a feedback signal with the $(x,y)$ position of the hand. At the same time, we also have a DMP system that’s doing its own thing, tracing out a desired trajectory in $(x,y)$ space. We have to tie these two systems together.

To do this is easy, we’ll generate the control signal for the plant from our DMP system simply by measuring the difference between the state of our DMP system and the plant state, use that to drive the plant to the state of the DMP system. Formally,

$u = k_p(y_{\textrm{DMP}} - y)$

where $y_{\textrm{DMP}}$ is the state of the DMP system, $y$ is the state of the plant, and $k_p$ and is the position error gain term.

Once we have this, we just go ahead and step our DMP system forward and make sure the gain values on the control signal are high enough that the plant follows the DMP’s trajectory. And that’s pretty much it, just run the DMP system to the end of the trajectory and then stop your simulation.

To give a demonstration of DMP control I’ve set up the DMP system to follow the same number trajectories that the SPAUN arm followed. As you can see the combination of DMPs and operational space control is much more effective than my previous implementation.

Incorporating system feedback

One of the issues in implementing the control above is that we have to be careful about how quickly the DMP trajectory moves, because while the DMP system isn’t constrained by any physical dynamics, the plant is. Depending on the size of the movement the DMP trajectory may be moving a foot a second or an inch a second. You can see above that the arm doesn’t fully draw out the desired trajectories in places where the DMP system moved too quickly in and out and sharp corners. The only way to remedy this without feedback is to have the DMP system move more slowly throughout the entire trajectory. What would be nice, instead, would be to just say ‘go as fast as you can, as long as the plant state is within some threshold distance of you’, and this is where system feedback comes in.

It’s actually very straightforward to implement this using system feedback: If the plant state drifts away from the state of the DMPs, slow down the execution speed of the DMP to allow the plant time to catch up. The do this we just have to multiply the DMP timestep $dt$ by a new term:

$1 / (1 + \alpha_{\textrm{err}}(y_{\textrm{plant}} - y_{\textrm{DMP}}))$.

All this new term does is slow down the canonical system when there’s an error, you can think of it as a scaling on the time step. Additionally, the sensitivity of this term can be modulated the scaling term $\alpha_{\textrm{err}}$ on the difference between the plant and DMP states.

We can get an idea of how this affects the system by looking at the dynamics of the canonical system when an error term is introduced mid-run:

When the error is introduced the dynamics of the system slow down, great! Lets look at an example comparing execution with and without this feedback term. Here’s the system drawing the number 3 without any feedback incorporation:

and here’s the system drawing the number 3 with the feedback term included:

These two examples are a pretty good case for including the feedback term into your DMP system. You can still see in the second case that the specified trajectory isn’t traced out exactly, but if that’s what you’re shooting for you can just crank up the $\alpha_{\textrm{err}}$ to make the DMP timestep really slow down whenever the DMP gets ahead of the plant at all.

Interpolating trajectories for imitation

When imitating trajectories there can be some issues with having enough sample points and how to fit them to the canonical system’s timeframe, they’re not difficult to get around but I thought I would go over what I did here. The approach I took was to always run the canonical system for 1 second, and whenever a trajectory is passed in that should be imitated to scale the x-axis of the trajectory such that it’s between 0 and 1. Then I shove it into an interpolator and use the resulting function to generate an abundance of nicely spaced sample points for the DMP imitator to match. Here’s the code for that:

# generate function to interpolate the desired trajectory
import scipy.interpolate

path = np.zeros((self.dmps, self.timesteps))
x = np.linspace(0, self.cs.run_time, y_des.shape[1])

for d in range(self.dmps):

# create interpolation function
path_gen = scipy.interpolate.interp1d(x, y_des[d])

# create evenly spaced sample points of desired trajectory
for t in range(self.timesteps):
path[d, t] = path_gen(t * self.dt)

y_des = path


Direct trajectory control vs DMP based control

Now, using the above described interpolation function we can just directly use its output to guide our system. And, in fact, when we do this we get very precise control of the end-effector, more precise than the DMP control, as it happens. The reason for this is because our DMP system is approximating the desired trajectory and with a set of basis functions, and some accuracy is being lost in this approximation.

So if we instead use the interpolation function to drive the plant we can get exactly the points that we specified. The times when this comes up especially are when the trajectories that you’re trying to imitate are especially complicated. There are ways to address this with DMPs by placing your basis functions more appropriately, but if you’re just looking for the exact replication of an input trajectory (as often people are) this is a simpler way to go. You can see the execution of this in the trace.py code up on my github. Here’s a comparison of a single word drawn using the interpolation function:

and here’s the same word drawn using a DMP system with 1,000 basis function per DOF:

We can see that just using the interpolation function here gives us the exact path that we specified, where using DMPs we have some error, and this error increases with the size of the desired trajectory. But! That’s for exactly following a given trajectory, which is often not the case. The strength of the DMP framework is that the trajectory is a dynamical system. This lets us do simple things to get really neat performance, like scale the trajectory spatially on the fly simply by changing the goal, rather than rescaling the entire trajectory:

Conclusions

Some basic examples of using DMPs to control the end-effector trajectory of an arm with operational space control were gone over here, and you can see that they work really nicely together. I like when things build like this. We also saw that power of DMPs in this situation is in their generalizability, and not in exact reproduction of a given path. If I have a single complex trajectory that I only want the end-effector to follow once then I’m going to be better off just interpolating that trajectory and feeding the coordinates into the arm controller rather than go through the whole process of setting up the DMPs.

Drawing words, though, is just one basic example of using the DMP framework. It’s a very simple application and really doesn’t do justice to the flexibility and power of DMPs. Other example applications include things like playing ping pong. This is done by creating a desired trajectory showing the robot how to swing a ping pong paddle, and then using a vision system to track the current location of the incoming ping pong ball and changing the target of the movement to compensate dynamically. There’s also some really awesome stuff with object avoidance, that is implemented by adding another term with some simple dynamics to the DMP. Discussed here, basically you just have another system that moves you away from the object with a strength relative to your distance from the object. You can also use DMPs to control gain terms on your PD control signal, which is useful for things like object manipulation.

And of course I haven’t touched on rhythmic DMPs or learning with DMPs at all, and those are both also really interesting topics! But this serves as a decent introduction to the whole area, which has been developed in the Schaal lab over the last decade or so. I recommend further reading with some of these papers if you’re interested, there are a ton of neat ways to apply the DMP framework! And, again, the code for everything here is up on my github in the control and pydmps repositories.

## Robot control part 7: OSC of a 3-link arm

So we’ve done control for the 2-link arm, and control of the one link arm is trivial (where we control joint angle, or x or y coordinate of the pendulum), so here I’ll just show an implementation of operation space control for a more interesting arm model, the 3-link arm model. The code can all be found up on my Github.

In theory there’s nothing different or more difficult about controlling a 3-link arm versus a 2-link arm. For the inertia matrix, what I ended up doing here is just jacking up all the values of the matrix to about 100, which causes the controller to way over control the arm, and you can see the torques are much larger than they would need to be if we had an accurate inertia matrix. But the result is the same super straight trajectories that we’ve come to expect from operational space control:

It’s a little choppy because I cut out a bunch of frames to keep the gif size down. But you get the point, it works. And quite well!

Because this is also a 3-link arm now and our task space force signal is 2D, we have redundant space of solutions, meaning that the task space control signal can be carried out in a number of ways. In other words, a null space exists for this controller. This means that we can throw another controller in our system to operate inside the null space of the first controller. We’ve already worked through all the math for this, so it’s straightforward to implement.

What kind of null space controller should we put in? Well, you may have noticed in the above animation the arm goes through itself, here’s another example:

Often it’s desirable to avoid this (because of physics or whatever), so what we can do is add a secondary controller that works to keep the arm’s elbow and wrist near some comfortable default angles. When we do this we get the following:

And there you have it! Operational space control of a three link arm with a secondary controller in the null space to try to keep the angles near their default / resting positions.

I also added mouse based control to the arm so it will try to follow your mouse when you move over the figure, which makes it pretty fun to explore the power of the controller. It’s interesting to see where the singularities become an issue, and how having a null space controller that’s operating in joint space can actually come to help the system move through those problem points more smoothly. Check it out! It’s all up on my Github.

## Robot control part 5: Controlling in the null space

In the last post, I went through how to build an operational space controller. It was surprisingly easy after we’ve worked through all the other posts. But maybe that was a little too easy for you. Maybe you want to do something more interesting like implement more than one controller at the same time. In this post we’ll go through how to work inside the null space of a controller to implement several seperate controllers simultaneously without interference.
Buckle up.

Null space forces

The last example comprises the basics of operational space control; describe the system, calculate the system dynamics, transform desired forces from an operational space to the generalized coordinates, and build the control signal to cancel out the undesired system dynamics. Basic operational space control works quite well, but it is not uncommon to have several control goals at once; such as move the end-effector to this position’ (primary goal), and keep the elbow raised high’ (secondary goal) in the control of a robot arm.

If the operational space can also serve as generalized coordinates, i.e. if the system state specified in operational space constrains all of the degrees of freedom of the robot, then there is nothing that can be done without affecting the performance of the primary controller. In the case of controlling a two-link robot arm this is the case. The end-effector Cartesian space (chosen as the operational space) could also be a generalized coordinates system, because a specific $(x,y)$ position fully constrains the position of the arm.

But often when using operational space control for more complex robots this is not the case. In these situations, the forces controlled in operational space have fewer dimensions than the robot has degrees of freedom, and so it is possible to accomplish the primary goal in a number of ways. The null space of this primary controller is the region of state space where there is a redundancy of solutions; the system can move in a number of ways and still not affect the completion of the goals of the primary controller. An example of this is all the different configurations the elbow can be in while a person moves their hand in a straight line. In these situations, a secondary controller can be created to operate in the null space of the primary controller, and the full control signal sent to the system is a sum of the primary control signal and a filtered version of the secondary control signal. In this section the derivation of the null-space filter will be worked through for a system with only a primary and secondary controller, but note that the process can be applied iteratively for systems with further controllers.

The filtering of the secondary control signal means that the secondary controller’s goals will only be accomplished if it is possible to do so without affecting the performance of the first controller. In other words, the secondary controller must operate in the null space of the first controller. Denote the primary operational space control signal, e.g. the control signal defined in the previous post, as $\textbf{u}_{\textbf{x}}$ and the control signal from the secondary controller $\textbf{u}_{\textrm{null}}$. Define the force to apply to the system

$\textbf{u} = \textbf{u}_{\textbf{x}} + (\textbf{I} - \textbf{J}_{ee}^T(\textbf{q}) \; \textbf{J}_{ee}^{T+}(\textbf{q})) \textbf{u}_{\textrm{null}},$

where $\textbf{J}_{ee}^{T+}(\textbf{q})$ is the pseudo-inverse of $\textbf{J}_{ee}^T(\textbf{q})$.

Examining the filtering term that was added,

$(\textbf{I} - \textbf{J}_{ee}^T(\textbf{q}) \textbf{J}_{ee}^{T+}(\textbf{q})) \textbf{u}_{\textrm{null}},$

it can be seen that the Jacobian transpose multiplied by its pseudo-inverse will be 1’s all along the diagonal, except in the null space. This means that $\textbf{u}_{\textrm{null}}$ is subtracted from itself everywhere that affects the operational space movement and is left to apply any arbitrary control signal in the null space of the primary controller.

Unfortunately, this initial set up does not adequately filter out the effects of forces that might be generated by the secondary controller. The Jacobian is defined as a relationship between the velocities of two spaces, and so operating in the null space defined by the Jacobian ensures that no velocities are applied in operational space, but the required filter must also prevent any accelerations from affecting movement in operational space. The standard Jacobian pseudo-inverse null space is a velocity null space, and so a filter built using it will allow forces affecting the system’s acceleration to still get through. What is required is a pseudo-inverse Jacobian defined to filter signals through an acceleration null space.

To acquire this acceleration filter, our control signal will be substituted into the equation for acceleration in the operational space, which, after cancelling out gravity effects with the control signal and removing the unmodeled dynamics, gives

$\ddot{\textbf{x}} = \textbf{J}_{ee}(\textbf{q}) \textbf{M}^{-1}(\textbf{q}) [\textbf{J}_{ee}^T(\textbf{q}) \; \textbf{M}_{\textbf{x}_{ee}}(\textbf{q}) \; \ddot{\textbf{x}}_\textrm{des} - (\textbf{I} - \textbf{J}_{ee}^T(\textbf{q})\;\textbf{J}_{ee}^{T+}(\textbf{q}))\;\textbf{u}_{\textrm{null}}].$

Rewriting this to separate the secondary controller into its own term

$\ddot{\textbf{x}} = \textbf{J}_{ee}(\textbf{q}) \textbf{M}^{-1}(\textbf{q}) \textbf{J}_{ee}^T(\textbf{q}) \; \textbf{M}_{\textbf{x}_{ee}}(\textbf{q}) \; \ddot{\textbf{x}}_\textrm{des} - \textbf{J}_{ee}(\textbf{q}) \textbf{M}^{-1}(\textbf{q})[\textbf{I} - \textbf{J}_{ee}^T(\textbf{q})\;\textbf{J}_{ee}^{T+}(\textbf{q})]\;\textbf{u}_{\textrm{null}},$

it becomes clear that to not cause any unwanted movement in operational space the second term must be zero.

There is only one free term left in the second term, and that is the pseudo-inverse. There are numerous different pseudo-inverses that can be chosen for a given situation, and here what is required is to engineer a pseudo-inverse such that the term multiplying $\textbf{u}_{\textrm{null}}$ in the above operational space acceleration equation is guaranteed to go to zero.

$\textbf{J}_{ee}(\textbf{q})\textbf{M}^{-1}(\textbf{q}) [\textbf{I} - \textbf{J}_{ee}^T (\textbf{q})\textbf{J}_{ee}^{T+}(\textbf{q})] \textbf{u}_{\textrm{null}} = \textbf{0},$

this needs to be true for all $\textbf{u}_{\textrm{null}}$, so it can be removed,

$\textbf{J}_{ee} (\textbf{q}) \; \textbf{M}^{-1}(\textbf{q}) [\textbf{1} - \textbf{J}_{ee}^T (\textbf{q}) \; \textbf{J}_{ee}^{T+} (\textbf{q})] = \textbf{0},$

$\textbf{J}_{ee}(\textbf{q}) \; \textbf{M}^{-1}(\textbf{q}) = \textbf{J}_{ee}(\textbf{q}) \; \textbf{M}^{-1}(\textbf{q}) \; \textbf{J}_{ee}^T(\textbf{q})\; \textbf{J}_{ee}^{T+}(\textbf{q}),$

substituting in our inertia matrix for operational space, which defines

$\textbf{J}_{ee} (\textbf{q}) \textbf{M}^{-1} (\textbf{q}) = \textbf{M}_{\textbf{x}_{ee}}^{-1} (\textbf{q}) \textbf{J}_{ee}^{T+} (\textbf{q}),$

$\textbf{J}_{ee}^{T+}(\textbf{q}) = \textbf{M}_{\textbf{x}_{ee}} (\textbf{q}) \; \textbf{J}_{ee}(\textbf{q}) \; \textbf{M}^{-1}(\textbf{q}).$

This specific Jacobian inverse was presented in this 1987 paper by Dr. Oussama Khatib and is called the `dynamically consistent generalized inverse’. Using this psuedo-inverse guarantees that any signal coming from the secondary controller will not affect movement in the primary controller’s operational space. Just as a side-note, the name ‘pseudo-inverse’ is a bit of misnomer here, since it doesn’t try to produce the identity when multiplied by the original Jacobian transpose, but hey. That’s what they’re calling it.

The null space filter cancels out the acceleration effects of forces in operational space from a signal that is being applied as part of the control system. But it can also be used to cancel out the effects of any unwanted signal that can be modeled. Given some undesirable force signal interfering with the system that can be effectively modeled, a null space filtering term can be implemented to cancel it out. The control signal in this case, with one primary operational space controller and a null space filter for the undesired force, looks like:

$\textbf{u} = \textbf{J}^T_{ee}(\textbf{q}) \; \textbf{M}_\textbf{x}(\textbf{q}) \; \ddot{\textbf{x}}_\textrm{des} - \textbf{g}(\textbf{q}) - \textbf{J}^T_{ee}(\textbf{q}) \;\textbf{J}^{T+}_{ee}(\textbf{q}) \; \textbf{u}_{\textrm{undesired force}}.$

We did it! This will now allow a high-priority operational space controller to execute without interference from a secondary controller operating in its null space to complete it’s own set of goals (when possible).

Example:

Given a three link arm (revolute-revolute-revolute) operating in the $(x,z)$ plane, shown below:

this example will construct the control system for a primary controller controlling the end-effector and a secondary controller working to keep the arm near its joint angles’ default resting positions.

Let the system state be $\textbf{q} = [q_0, q_1, q_2]^T$ with default positions $\textbf{q}^0 = \left[\frac{\pi}{3}, \frac{\pi}{4}, \frac{\pi}{4} \right]^T$. The control signal of the secondary controller is the difference between the target state and the current system state

$\textbf{u}_{\textrm{null}} = k_{p_{\textrm{null}}}(\textbf{q}^0 - \textbf{q}),$

where $k_{p_\textrm{null}}$ is a gain term.

Let the centres of mass be

$\textrm{com}_0 = \left[ \begin{array}{c} \frac{1}{2}cos(q_0) \\ 0 \\ \frac{1}{2}sin(q_0) \end{array} \right], \;\;\;\; \textrm{com}_1 = \left[ \begin{array}{c} \frac{1}{4}cos(q_1) \\ 0 \\ \frac{1}{4}sin(q_1) \end{array} \right] \;\;\;\; \textrm{com}_2 = \left[ \begin{array}{c} \frac{1}{2}cos(q_2) \\ 0 \\ \frac{1}{4} sin (q_2) \end{array} \right],$

the Jacobians for the COMs are

$\textbf{J}_0(\textbf{q}) = \left[ \begin{array}{ccc} -\frac{1}{2} sin(q_0) & 0 & 0 \\ 0 & 0 & 0 \\ \frac{1}{2} cos(q_0) & 0 & 0 \\ 0 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 0 \end{array} \right],$

$\textbf{J}_1(\textbf{q}) = \left[ \begin{array}{ccc} -L_0sin(q_0) - \frac{1}{4}sin(q_{01}) & -\frac{1}{4}sin(q_{01}) & 0 \\ 0 & 0 & 0 \\ L_0 cos(q_0) + \frac{1}{4} cos(q_{01})& \frac{1}{4} cos(q_{01}) & 0 \\ 0 & 0 & 0 \\ 1 & 1 & 0 \\ 0 & 0 & 0 \end{array} \right]$

$\textbf{J}_2(\textbf{q}) = \left[ \begin{array}{ccc} -L_0sin(q_0) - L_1sin(q_{01}) - \frac{1}{2}sin(q_{012}) & -L_1sin(q_{01}) - \frac{1}{2}sin(q_{012}) & - \frac{1}{2}sin(q_{012}) \\ 0 & 0 & 0 \\ L_0 cos(q_0) + L_1 cos(q_{01}) + \frac{1}{4}cos(q_{012}) & L_1 cos(q_{01}) + \frac{1}{4} cos(q_{012}) & \frac{1}{4}cos(q_{012}) \\ 0 & 0 & 0 \\ 1 & 1 & 1 \\ 0 & 0 & 0 \end{array} \right].$

The Jacobian for the end-effector of this three link arm is

$\textbf{J}_{ee} = \left[ \begin{array}{ccc} -L_0 sin(q_0) - L_1 sin(q_{01}) - L_2 sin(q_{012}) & - L_1 sin(q_{01}) - L_2 sin(q_{012}) & - L_2 sin(q_{012}) \\ L_0 cos(q_0) + L_1 cos(q_{01}) + L_2 cos(q_{012}) & L_1 cos(q_{01}) + L_2 cos(q_{012}) & L_2 cos(q_{012}), \end{array} \right]$

where $q_{01} = q_0 + q_1$ and $q_{012} = q_0 + q_1 + q_2$.

Taking the control signal developed in Section~\ref{sec:exampleOS}

$\textbf{u} = \textbf{J}^T_{ee}(\textbf{q}) \; \textbf{M}_{\textbf{x}_{ee}}(\textbf{q}) [k_p (\textbf{x}_{\textrm{des}} - \textbf{x}) + k_v (\dot{\textbf{x}}_{\textrm{des}} - \dot{\textbf{x}})] - \textbf{g}(\textbf{q}),$

where $\textbf{M}_{\textbf{x}_{ee}}(\textbf{q})$ was defined in the previous post, and $\textbf{g}(\textbf{q})$ is defined two posts ago, and $k_p$ and $k_v$ are gain terms, usually set such that $k_v = \sqrt{k_p}$, and adding in the null space control signal and filter gives

$\textbf{u} = \textbf{J}^T_{ee}(\textbf{q}) \; \textbf{M}_{\textbf{x}_{ee}}(\textbf{q}) [k_p (\textbf{x}_{\textrm{des}} - \textbf{x}) + k_v (\dot{\textbf{x}}_{\textrm{des}} - \dot{\textbf{x}})] - (\textbf{I} - \textbf{J}^T_{ee}(\textbf{q}) \textbf{J}^{T+}_{ee}(\textbf{q})) \textbf{u}_{\textrm{null}} - \textbf{g}(\textbf{q}),$

where $\textbf{J}^{T+}(\textbf{q})$ is the dynamically consistent generalized inverse defined above, and $\textbf{u}_{\textrm{null}}$ is our null space signal!

Conclusions

It’s a lot of math, but when you start to get a feel for it what’s really awesome is that this is it. We’re describing the whole system, and so by working with these equations we can get a super effective controller. Which is pretty cool. Especially in relation to other possible controllers.

Alright! We’ve now worked through all the basic theory for operational space control, it is time to get some implementations going.

## Robot control part 4: Operation space control

In this post we’ll look at operational space control and how to derive the control equations. I’d like to mention again that these posts have all come about as a result of me reading and working through Samir Menon’s operational space control tutorial, where he works through an implementation example on a revolute-prismatic-prismatic robot arm.

Generalized coordinates vs operational space

The term generalized coordinates refers to a characterization of the system that uniquely defines its configuration. For example, if our robot has 7 degrees of freedom, then there are 7 state variables, such that when all these variables are given we can fully account for the position of the robot. In the previous posts of this series we’ve been describing robotic arms in joint space, and for these systems joint space is an example of generalized coordinates. This means that if we know the angles of all of the joints, we can draw out exactly what position that robot is in. An example of a coordinate system that does not uniquely define the configuration of a robotic arm would be one that describes only the $x$ position of the end-effector.

So generalized coordinates tell us everything we need to know about where the robot is, that’s great. The problem with generalized coordinates, though, is that planning trajectories in this space for tasks that we’re interested in performing tends not to be straight forward. For example, if we have a robotic arm, and we want to control the position of the end-effector, it’s not obvious how to control the position of the end-effector by specifying a trajectory for each of the arm’s joints to follow through joint space.

The idea behind operational space control is to abstract away from the generalized coordinates of the system and plan a trajectory in a coordinate system that is directly relevant to the task that we wish to perform. Going back to the common end-effector position control situation, we would like to operate our arm in 3D $(x,y,z)$ Cartesian space. In this space, it’s obvious what trajectory to follow to move the end-effector between two positions (most of the time it will just be a straight line in each dimension). So our goal is to build a control system that lets us specify a trajectory in our task space and will transform this signal into generalized coordinates that it can then send out to the system for execution.

Operational space control of simple robot arm

Alright, we’re going to work through an example. The generalized coordinates for this example is going to be joint space, and the operational space is going to be the end-effector Cartesian coordinates relative to the a reference frame attached to the base. Recycling the robot from the second post in this series, here’s the set up we’ll be working with:

Once again, we’re going to need to find the Jacobians for the end-effector of the robot. Fortunately, we’ve already done this:

$\textbf{J} = \left[ \begin{array}{cc} -L_0 sin(\theta_0) - L_1 sin(\theta_0 + \theta_1) & - L_1 sin(\theta_0 + \theta_1) \\ L_0 cos(\theta_0) + L_1 cos(\theta_0 + \theta_1) & L_1 cos(\theta_0 + \theta_1) \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \\ 1 & 1 \end{array} \right]$

Great! So now that we have $\textbf{J}$, we can go ahead and transform forces from end-effector (hand) space to joint space as we discussed in the second post:

$\textbf{u} = \textbf{J}_{ee}^T(\textbf{q}) \; \textbf{F}_{\textbf{x}}.$

Rewriting $\textbf{F}_\textbf{x}$ as its component parts

$\textbf{F}_{\textbf{x}} = \textbf{M}_{\textbf{x}_{ee}}(\textbf{q}) \; \ddot{\textbf{x}}_\textrm{des},$

where $\ddot{\textbf{x}}$ is end-effector acceleration, and $\textbf{M}_{\textbf{x}_{ee}(\textbf{q})}$ is the inertia matrix in operational space. Unfortunately, this isn’t just the normal inertia matrix, so let’s take a look here at how to go about deriving it.

Inertia in operational space

Being able to calculate $\textbf{M}(\textbf{q})$ allows inertia to be cancelled out in joint-space by incorporating it into the control signal, but to cancel out the inertia of the system in operational space more work is still required. The first step will be calculating the acceleration in operational space. This can be found by taking the time derivative of our original Jacobian equation.

$\frac{d}{d t}\dot{\textbf{x}} = \frac{d}{d t} (\textbf{J}_{ee}(\textbf{q}) \; \dot{\textbf{q}}),$

$\ddot{\textbf{x}} = \dot{\textbf{J}}_{ee}(\textbf{q}) \; \dot{\textbf{q}} + \textbf{J}_{ee} (\textbf{q})\; \ddot{\textbf{q}}.$

Substituting in the dynamics of the system, as defined in the previous post, but ignoring the effects of gravity for now, gives:

$\ddot{\textbf{x}} = \dot{\textbf{J}}_{ee}(\textbf{q}) \; \dot{\textbf{q}} + \textbf{J}_{ee} (\textbf{q})\; \textbf{M}^{-1}(\textbf{q}) [ \textbf{u} - \textbf{C}(\textbf{q}, \dot{\textbf{q}})].$

Define the control signal

$\textbf{u} = \textbf{J}_{ee}^T(\textbf{q}) \textbf{F}_\textbf{x},$

where substituting in for $\textbf{F}_\textbf{x}$, the desired end-effector force, gives

$\textbf{u} = \textbf{J}_{ee}^T(\textbf{q})\; \textbf{M}_{\textbf{x}_{ee}}(\textbf{q})\; \ddot{\textbf{x}}_\textrm{des},$

where $\ddot{\textbf{x}}_\textrm{des}$ denotes the desired end-effector acceleration. Substituting the above equation into our equation for acceleration in operational space gives

$\ddot{\textbf{x}} = \dot{\textbf{J}}_{ee}(\textbf{q}) \; \dot{\textbf{q}} + \textbf{J}_{ee} (\textbf{q})\; \textbf{M}^{-1}(\textbf{q}) [ \textbf{J}_{ee}^T(\textbf{q})\; \textbf{M}_{\textbf{x}_{ee}}(\textbf{q})\; \ddot{\textbf{x}}_\textrm{des} - \textbf{C}(\textbf{q}, \dot{\textbf{q}})].$

$\ddot{\textbf{x}} = \textbf{J}_{ee}(\textbf{q})\; \textbf{M}^{-1}(\textbf{q}) \; \textbf{J}_{ee}^T(\textbf{q})\; \textbf{M}_{\textbf{x}_{ee}}(\textbf{q})\; \ddot{\textbf{x}}_\textrm{des} + [\dot{\textbf{J}}_{ee}(\textbf{q}) \; \dot{\textbf{q}} - \textbf{J}_{ee}(\textbf{q})\textbf{M}^{-1}(\textbf{q}) \; \textbf{C}(\textbf{q}, \dot{\textbf{q}})],$

the last term is ignored due to the complexity of modeling it, resulting in

$\ddot{\textbf{x}} = \textbf{J}_{ee}(\textbf{q})\; \textbf{M}^{-1}(\textbf{q}) \textbf{J}_{ee}^T(\textbf{q})\; \textbf{M}_{\textbf{x}_{ee}}(\textbf{q})\; \ddot{\textbf{x}}_\textrm{des}.$

At this point, to get the dynamics $\ddot{\textbf{x}}$ to be equal to the desired acceleration $\ddot{\textbf{x}}_\textrm{des}$, the end-effector inertia matrix $\textbf{M}_{\textbf{x}_{ee}}$ needs to be chosen carefully. By setting

$\textbf{M}_{\textbf{x}_{ee}}(\textbf{q}) = [\textbf{J}_{ee}(\textbf{q}) \; \textbf{M}^{-1}(\textbf{q}) \; \textbf{J}_{ee}^T(\textbf{q})]^{-1},$

we now get

$\ddot{\textbf{x}} = \textbf{J}_{ee}(\textbf{q})\; \textbf{M}^{-1}(\textbf{q}) \textbf{J}_{ee}^T(\textbf{q})\; [\textbf{J}_{ee}(\textbf{q}) \; \textbf{M}^{-1}(\textbf{q}) \; \textbf{J}_{ee}^T(\textbf{q})]^{-1} \; \ddot{\textbf{x}}_\textrm{des},$

$\ddot{\textbf{x}} = \ddot{\textbf{x}}_\textrm{des}.$

And that’s why and how the inertia matrix in operational space is defined!

The whole signal

Going back to the control signal we were building, let’s now add in a term to cancel the effects of gravity in joint space. This gives

$\textbf{u} = \textbf{J}_{ee}^T(\textbf{q}) \textbf{M}_{\textbf{x}_{ee}}(\textbf{q}) \ddot{\textbf{x}}_\textrm{des} + \textbf{g}(\textbf{q}),$

where $\textbf{g}(\textbf{q})$ is the same as defined in the previous post. This controller converts desired end-effector acceleration into torque commands, and compensates for inertia and gravity.

Defining a basic PD controller in operational space

$\ddot{\textbf{x}}_\textrm{des} = k_p (\textbf{x}_{\textrm{des}} - \textbf{x}) + k_v (\dot{\textbf{x}}_{\textrm{des}} - \dot{\textbf{x}}),$

and the full equation for the operational space control signal in joint space is:

$\textbf{u} = \textbf{J}_{ee}^T(\textbf{q}) \; \textbf{M}_{\textbf{x}_{ee}}(\textbf{q}) [k_p (\textbf{x}_{\textrm{des}} - \textbf{x}) + k_v (\dot{\textbf{x}}_{\textrm{des}} - \dot{\textbf{x}})] + \textbf{g}(\textbf{q}).$

Hurray! That was relatively simple. The great thing about this, though, is that it’s the same process for any robot arm! So go out there and start building controllers! Find your robot’s mass matrix and gravity term in generalized coordinates, the Jacobian for the end effector, and you’re in business.

Conclusions

So, this feels a little anticlimactic without an actual simulation / implementation of operational space, but don’t worry! As avid readers (haha) will remember, a while back I worked out how to import some very realistic MapleSim arm simulations into Python for use with some Python controllers. This seems a perfect application opportunity, so that’s next! A good chance to work through writing the controllers for different arms and also a chance to play with controllers operating in null spaces and all the like.

Actual simulation implementations will also be a good chance to play with trying to incorporate those other force terms into the control equation, and get to see the results without worrying about breaking an actual robot. In actual robots a lot of the time you leave out anything where your model might be inaccurate because the last thing to do is falsely compensate for some forces and end up injecting energy into your system, making it unstable.

There’s still some more theory to work through though, so I’d like to do that before I get to implementing simulations. One more theory post, and then we’ll get back to code!

## Robot control part 3: Accounting for mass and gravity

In the exciting previous post we looked at how to go about generating a Jacobian matrix, which we could use to transformation both from joint angle velocities to end-effector velocities, and from desired end-effector forces into joint angle torques. I briefly mentioned right at the end that using just this force transformation to build your control signal was only appropriate for very simple systems that didn’t have to account for things like arm-link mass or gravity.

In general, however, mass and gravity must be accounted for and cancelled out. The full dynamics of a robot arm are

$\textbf{M}(\textbf{q}) \ddot{\textbf{q}} = (\textbf{u} - \textbf{C}(\textbf{q}, \dot{\textbf{q}}) - \textbf{g}(\textbf{q})) ,$

where $\ddot{\textbf{q}}$ is joint angle acceleration, $\textbf{u}$ is the control signal (specifying torque), $\textbf{C}(\textbf{q}, \dot{\textbf{q}})$ is a function describing the Coriolis and centrifugal effects, $\textbf{g}(\textbf{q})$ is the effect of gravity in joint space, and $\textbf{M}$ is the mass matrix of the system in joint space.

There are a lot of terms involved in the system acceleration, so while the Jacobian can be used to transform forces between coordinate systems it is clear that just setting the control signal $\textbf{u} = \textbf{J}_{ee}^T (\textbf{q})\textbf{F}_\textbf{x}$ is not sufficient, because a lot of the dynamics affecting acceleration aren’t accounted for. In this section an effective PD controller operating in joint space will be developed that will allow for more precise control by cancelling out unwanted acceleration terms. To do this the effects of inertia and gravity need to be calculated.

Accounting for inertia

The fact that systems have mass is a pain in our controller’s side because it introduces inertia into our system, making control of how the system will move at any given point in time more difficult. Mass can be thought of as an object’s unwillingness to respond to applied forces. The heavier something is, the more resistant it is to acceleration, and the force required to move a system along a desired trajectory depends on both the object’s mass and its current acceleration.

To effectively control a system, the system inertia needs to be calculated so that it can be included in the control signal and cancelled out.

Given the robot arm above, operating in the $(x,z)$ plane, with the $y$ axis extending into the picture where the yellow circles represent each links centre-of-mass (COM). The position of each link is COM is defined relative to that link’s reference frame, and the goal is to figure out how much each link’s mass will affect the system dynamics.

The first step is to transform the representation of each of the COM from Cartesian coordinates in the reference frame of their respective arm segments into terms of joint angles, such that the Jacobian for each COM can be calculated.

Let the COM positions relative to each segment’s coordinate frame be

$\textrm{com}_0 = \left[ \begin{array}{c} \frac{1}{2}cos(q_0) \\ 0 \\ \frac{1}{2}sin(q_0) \end{array} \right], \;\;\;\; \textrm{com}_1 = \left[ \begin{array}{c} \frac{1}{4}cos(q_1) \\ 0 \\ \frac{1}{4}sin(q_1) \end{array} \right].$

The first segment’s COM is already in base coordinates (since the first link and the base share the same coordinate frame), so all that is required is the position of the second link’s COM in the base reference frame, which can be done with the transformation matrix

$^1_0\textbf{T} = \left[ \begin{array}{cccc} cos(q_1) & 0 & -sin(q_1) & L_0 cos(q_0) \\ 0 & 1 & 0 & 0 \\ sin(q_1) & 0 & cos(q_1) & L_0 sin(q_0) \\ 0 & 0 & 0 & 1 \end{array} \right].$

Using $^1_0\textbf{T}$ to transform the $\textrm{com}_1$ gives

$^1_0\textbf{T} \; \textrm{com}_1 = \left[ \begin{array}{cccc} cos(q_1) & 0 & -sin(q_1) & L_0 cos(q_0) \\ 0 & 1 & 0 & 0 \\ sin(q_1) & 0 & cos(q_1) & L_0 sin(q_0) \\ 0 & 0 & 0 & 1 \end{array} \right] \; \; \left[ \begin{array}{c} \frac{1}{4}cos(q_1) \\ 0 \\ \frac{1}{4}sin(q_1) \\ 1 \end{array} \right]$

$^1_0\textbf{T} \; \textrm{com}_1 = \left[ \begin{array}{c} L_0 cos(q_0) + \frac{1}{4}cos(q_0 + q_1) \\ 0 \\ L_0 sin(q_0) + \frac{1}{4} cos(q_0 + q_1) \\ 1 \end{array} \right].$

To see the full computation worked out explicitly please see my previous robot control post.

Now that we have the COM positions in terms of joint angles, we can find the Jacobians for each point through our Jacobian equation:

$\textbf{J} = \frac{\partial \textbf{x}}{\partial \textbf{q}}$.

Using this for each link gives us:

$\textbf{J}_0 = \left[ \begin{array}{cc} -\frac{1}{2}sin(q_0) & 0 \\ 0 & 0 \\ \frac{1}{2} cos(q_0) & 0 \\ 0 & 0 \\ 1 & 0 \\ 0 & 0 \end{array} \right]$
$\textbf{J}_1 = \left[ \begin{array}{cc} -L_0sin(q_0) -\frac{1}{4}sin(\theta_0 + q_1) & -\frac{1}{4} sin(q_0 + \theta_1) \\ 0 & 0 \\ L_0 cos(q_0) + \frac{1}{4}cos(q_0 + q_1) & \frac{1}{4} cos(q_0 +q_1) \\ 0 & 0 \\ 1 & 1 \\ 0 & 0 \end{array} \right]$.

Kinetic energy

The total energy of a system can be calculated as a sum of the energy introduced from each source. The Jacobians just derived will be used to calculate the kinetic energy each link generates during motion. Each link’s kinetic energy will be calculated and summed to get the total energy introduced into the system by the mass and configuration of each link.

Kinetic energy (KE) is one half of mass times velocity squared:

$\textrm{KE} = \frac{1}{2} \; \dot{\textbf{x}}^T \textbf{M}_\textbf{x}(\textbf{q}) \; \dot{\textbf{x}},$

where $\textbf{M}_\textbf{x}$ is the mass matrix of the system, with the subscript $\textbf{x}$ denoting that it is defined in Cartesian space, and $\dot{\textbf{x}}$ is a velocity vector, where $\dot{\textbf{x}}$ is of the form

$\dot{\textbf{x}} = \left[ \begin{array}{c} \dot{x} \\ \dot{y} \\ \dot{z} \\ \dot{\omega_x} \\ \dot{\omega_y} \\ \dot{\omega_z} \end{array} \right],$

and the mass matrix is structured

$\textbf{M}_{\textbf{x}_i} (\textbf{q})= \left[ \begin{array}{cccccc} m_i & 0 & 0 & 0 & 0 & 0 \\ 0 & m_i & 0 & 0 & 0 & 0 \\ 0 & 0 & m_i & 0 & 0 & 0 \\ 0 & 0 & 0 & I_{xx} & I_{xy} & I_{xz} \\ 0 & 0 & 0 & I_{yx} & I_{yy} & I_{yz} \\ 0 & 0 & 0 & I_{zx} & I_{zy} & I_{zz} \end{array} \right],$

where $m_i$ is the mass of COM $i$, and the $I_{ij}$ terms are the moments of inertia, which define the object’s resistance to change in angular velocity about the axes, the same way that the mass element defines the object’s resistance to changes in linear velocity.

As mentioned above, the mass matrix for the COM of each link is defined in Cartesian coordinates in its respective arm segment’s reference frame. The effects of mass need to be found in joint angle space, however, because that is where the controller operates. Looking at the summation of the KE introduced by each COM:

$\textrm{KE} = \frac{1}{2} \; \Sigma_{i=0}^n ( \dot{\textbf{x}}_i^T \textbf{M}_{\textbf{x}_i}(\textbf{q}) \; \dot{\textbf{x}}_i),$

and substituting in $\dot{\textbf{x}} = \textbf{J} \; \dot{\textbf{q}}$,

$\textrm{KE}_i \ \frac{1}{2} \; \Sigma_{i=0}^n (\dot{\textbf{q}}^T \; \textbf{J}_i^T \textbf{M}_{\textbf{x}_i}(\textbf{q})\textbf{J}_i \; \dot{\textbf{q}}),$

and moving the $\dot{\textbf{q}}$ terms outside the summation,

$\textrm{KE}_i = \frac{1}{2} \; \dot{\textbf{q}}^T \; \Sigma_{i=0}^n (\textbf{J}_i^T \textbf{M}_{\textbf{x}_i}(\textbf{q}) \textbf{J}_i) \; \dot{\textbf{q}}.$

Defining

$\textbf{M}(\textbf{q}) = \Sigma_{i=0}^n \; \textbf{J}_i^T(\textbf{q}) \textbf{M}_{\textbf{x}_i}(\textbf{q}) \; \textbf{J}_i(\textbf{q}),$

gives

$\textrm{KE} = \frac{1}{2} \; \dot{\textbf{q}}^T \; \textbf{M}(\textbf{q}) \; \dot{\textbf{q}},$

which is the equation for calculating kinetic energy in joint space. Thus, $\textbf{M}(\textbf{q})$ denotes the inertia matrix in joint space.

Now that we’ve successfully calculated the mass matrix of the system in joint space, we can incorporate it into our control signal and cancel out its effects on the system dynamics! On to the next problem!

Accounting for gravity

With the forces of inertia accounted for, we can now address the problem of gravity. To compensate for gravity the concept of conservation of energy (i.e. the work done by gravity is the same in all coordinate systems) will once again be pulled out. The controller operates by applying torque on joints, so it is necessary to be able to calculate the effect of gravity in joint space to cancel it out.

While the effect of gravity in joint space isn’t obvious, it is quite easily defined in Cartesian coordinates in the base frame of reference. Here, the work done by gravity is simply the summation of the distance each link’s center of mass has moved multiplied by the force of gravity. Where the force of gravity in Cartesian space is the mass of the object multiplied by -9.8m/s$^2$ along the $z$ axis, the equation for the work done by gravity is written:

$\textbf{W}_g = \Sigma^n_{i=0} (\textbf{F}_{g_i}^T \dot{\textbf{x}}_i),$

where $\textbf{F}_{g_i}$ is the force of gravity on the $i$th arm segment. Because of the conservation of energy, the equation for work is equivalent when calculated in joint space, substituting into the above equation with the equation for work:

$\textbf{F}_\textbf{q}^T \dot{\textbf{q}} = \Sigma^n_{i=0} (\textbf{F}_{g_i}^T \dot{\textbf{x}}_i),$

and then substitute in using $\dot{\textbf{x}}_i = \textbf{J}_i(\textbf{q}) \; \dot{\textbf{q}}$,

$\textbf{F}_\textbf{q}^T \dot{\textbf{q}} = \Sigma^n_{i=0} (\textbf{F}_{g_i}^T \textbf{J}_i(\textbf{q}) \; \dot{\textbf{q}}),$

and cancelling out the $\dot{\textbf{q}}$ terms on both sides,

$\textbf{F}_\textbf{q}^T = \Sigma^n_{i=0} (\textbf{F}_{g_i}^T \textbf{J}_i(\textbf{q})),$

$\textbf{F}_\textbf{q} = \Sigma^n_{i=0} (\textbf{J}_i^T(\textbf{q}) \textbf{F}_{g_i}) = \textbf{g}(\textbf{q}),$

which says that to find the effect of gravity in joint space simply multiply the mass of each link by its Jacobian, multiplied by the force of gravity in $(x,y,z)$ space, and sum over each link. This summation gives the total effect of the gravity on the system.

Making a PD controller in joint space

We are now able to account for the energy in the system caused by inertia and gravity, great! Let’s use this to build a simple PD controller in joint space. Control should be very straight forward because once we cancel out the effects of gravity and inertia then we can almost pretend that the system behaves linearly. This means that we can also treat control of each of the joints independently, since their movements no longer affect one another. So in our control system we’re actually going to have a PD controller for each joint.

The above-mentioned nonlinearity that’s left in the system dynamics is due to the Coriolis and centrifugal effects. Now, these can be accounted for, but they require highly accurate model of the moments of inertia. If the moments are incorrect then the controller can actually introduce instability into the system, so it’s better if we just don’t address them.

Rewriting the system dynamics presented at the very top, in terms of acceleration gives

$\ddot{\textbf{q}} = \textbf{M}^{-1}(\textbf{q}) (\textbf{u} - \textbf{C}(\textbf{q}, \dot{\textbf{q}}) - \textbf{g}(\textbf{q})).$

Ideally, the control signal would be constructed

$\textbf{u} = (\textbf{M}(\textbf{q}) \; \ddot{\textbf{q}}_\textrm{des} + \textbf{C}(\textbf{q}, \dot{\textbf{q}}) + \textbf{g}(\textbf{q})),$

where $\ddot{\textbf{q}}_\textrm{des}$ is the desired acceleration of the system. This would result in system acceleration

$\ddot{\textbf{q}} = \textbf{M}^{-1}(\textbf{q})((\textbf{M}(\textbf{q}) \; \ddot{\textbf{q}}_\textrm{des} + \textbf{C}(\textbf{q}, \dot{\textbf{q}}) + \textbf{g}(\textbf{q})) - \textbf{C}(\textbf{q}, \dot{\textbf{q}}) - \textbf{g}(\textbf{q})),$

$\ddot{\textbf{q}} = \textbf{M}^{-1}(\textbf{q}) \textbf{M}(\textbf{q}) \; \ddot{\textbf{q}}_\textrm{des} ,$

$\ddot{\textbf{q}} = \ddot{\textbf{q}}_\textrm{des},$

which would be ideal. As mentioned, because the Coriolis and centrifugal effects are tricky to account for we’ll leave them out, so the instead the control signal is

$\textbf{u} = (\textbf{M}(\textbf{q}) \; \ddot{\textbf{q}}_\textrm{des} + \textbf{g}(\textbf{q})).$

Using a standard PD control formula to generate the desired acceleration:

$\ddot{\textbf{q}}_\textrm{des} = k_p \; (\textbf{q}_{\textrm{des}} - \textbf{q}) + k_v \; (\dot{\textbf{q}}_{\textrm{des}} - \dot{\textbf{q}}),$

where $k_p$ and $k_v$ are our gain values, and the control signal has been fully defined:

$\textbf{u} = (\textbf{M}(\textbf{q}) \; (k_p \; (\textbf{q}_{\textrm{des}} - \textbf{q}) + k_v \; (\dot{\textbf{q}}_{\textrm{des}} - \dot{\textbf{q}})) + \textbf{g}(\textbf{q})),$

and we’ve successfully build an effective PD controller in joint space!

Conclusions

Here we looked at building a PD controller that operates in the joint space of a robotic arm that can cancel out the effects of inertia and gravity. By cancelling out the effects of inertia, we can treat control of each of the joints independently, effectively orthogonalizing their control. This makes PD control super easy, we just set up a simple controller for each joint. Also a neat thing is that all of the required calculations can be performed with algorithms of linear complexity, so it’s not a problem to do all of this super fast.

One of the finer points was that we ignored the Coriolis and centrifugal effects on the robot’s dynamics. This is because in the mass matrix model of the moments of inertia are notoriously hard to accurately capture on actual robots. Often you go based off of a CAD model of your robot and then have to do some fine-tuning by hand. So they will be unaccounted for in our control signal, but most of the time as long as you have a very short feedback loop you’ll be fine.

I am really enjoying working through this, as things build on each other so well here and we’re starting to be able to do some really interesting things with the relatively forward transformation matrices and Jacobians that we learned how to build in the previous posts. This was for a very simple robot, but excitingly the next step after this is moving on to operational space control! At last. From there, we’ll go on to look at more complex robotic situations where things like configuration redundancy are introduced and it’s not quite so straightforward.