## Natural policy gradient in TensorFlow

In working towards reproducing some results from deep learning control papers, one of the learning algorithms that came up was natural policy gradient. The basic idea of natural policy gradient is to use the curvature information of the of the policy’s distribution over actions in the weight update. There are good resources that go into details about the natural policy gradient method (such as here and here), so I’m not going to go into details in this post. Rather, possibly just because I’m new to TensorFlow, I found implementing this method less than straightforward due to a number of unexpected gotchyas. So I’m going to quickly review the algorithm and then mostly focus on the implementation issues that I ran into.

If you’re not familiar with policy gradient or various approaches to the cart-pole problem I also suggest that you check out this blog post from KV Frans, which provides the basis for the code I use here. All of the code that I use in this post is available up on my GitHub, and it’s worth noting this code is for TensorFlow 1.7.0.

Let’s informally define our notation

• $s_t$ and $a_t$ are the system state and chosen action at time $t$
• $\theta$ are the network parameters that we’re learning
• $\pi_\theta$ is the control policy network parameterized $\theta$
• $A^\pi(s_t, a_t, t)$ is the advantage function, which returns the estimated total reward for taking action $a_t$ in state $s_t$ relative to default behaviour (i.e. was this action better or worse than what we normally do)

The regular policy gradient is calculated as:

$\textbf{g} = \frac{1}{T}\sum_{t=0}^T \nabla_\theta \; \textrm{log} \; \pi_\theta(a_t | s_t) \; \hat{A}^\pi(s_t, a_t, t),$

where $\nabla_\theta$ denotes calculating the partial derivative with respect to $\theta$, the hat above $A^\pi$ denotes that we’re using an estimation of the advantage function, and $\pi_\theta(a_t | s_t)$ returns a scalar value which is the probability of the chosen action $a_t$ in state $s_t$.

For the natural policy gradient, we’re going to calculate the Fisher information matrix, $F$, which estimates the curvature of the control policy:

$\hat{F}_{\theta_k} = \frac{1}{T} \sum_{t=0}^T \nabla_\theta \; \textrm{log} \; \pi_\theta(a_t | s_t) \; \nabla_\theta \; \textrm{log} \; \pi_\theta (a_t | s_t)^T.$

The goal of including curvature information is to be able move along the steepest ascent direction, which we estimate with $\hat{\textbf{F}}^{-1}_{\theta_k}$. Including this then the weight update becomes

$\theta_{k+1} = \theta_k + \alpha \hat{\textbf{F}}^{-1}_{\theta_k} \textbf{g},$

where $\alpha$ is a learning rate parameter. In his paper Towards Generalization and Simplicity in Continuous Control, Aravind Rajeswaran notes that empirically it’s difficult to choose an appropriate $\alpha$ value or set an appropriate schedule. To address this, they normalize the update under the Fisher metric:

$\theta_{k+1} = \theta_k + \sqrt{\frac{\delta}{\textbf{g}^T\hat{\textbf{F}}^{-1}_{\theta_k}\textbf{f}}} \; \hat{\textbf{F}}^{-1}_{\theta_k} \textbf{g}.$

Using the above for the learning rate can be thought of as choosing a step size $\delta$ that is normalized with respect to the change in the control policy. This is beneficial because it means that we don’t do any parameter updates that will drastically change the output of the policy network.

Here’s what the pseudocode for the algorithm looks like

• Initialize policy parameters $\theta_0$
• for k = 1 to K do:
• collect N trajectories by rolling out the stochastic policy $\pi_{\theta_k}$
• compute $\nabla_\theta \; \textrm{log} \; \pi_\theta(a_t | s_t)$ for each $(s, a)$ pair along the trajectories sampled
• compute advantages $\hat{A}^\pi_k$ based on the sampled trajectories and the estimated value function $V^\pi_{k-1}$
• compute the policy gradient $g$ as specified above
• compute the Fisher matrix and perform the weight update as specified above
• update the value network to get $V^\pi_k$

OK great! On to implementation.

Implementation in TensorFlow 1.7.0

Calculating the gradient at every time step

In TensorFlow, the tf.gradient(ys, xs) function, which calculates the partial derivative of  ys with respect to xs:

$\nabla_{X} Y$

will automatically sum over over all elements in ys. There is no function for getting the gradient of each individual entry (i.e. the Jacobian) in ys, which is less than great. People have complained, but until some update happens we can address this with basic list comprehension to iterate through each entry in the history and calculate the gradient at that point in time. If you have a faster means that you’ve tested please comment below!

The code for this looks like

        g_log_prob = tf.stack(
for i in range(action_log_prob_flat.get_shape()[0])])


Where the result is called g_log_prob_flat because it’s the gradient of the log probability of the chosen action at each time step.

Inverse of a positive definite matrix with rounding errors

One of the problems I ran into was in calculating the normalized learning rate $\sqrt{\frac{\delta}{\textbf{g}^T\hat{\textbf{F}}^{-1}_{\theta_k}\textbf{f}}}$ was that the square root calculation was dying because negative values were being set in. This was very confusing because $\hat{\textbf{F}}$ is a positive definite matrix, which means that $\hat{\textbf{F}}^{-1}$ is also positive definite, which means that $\textbf{x}^T \hat{\textbf{F}}^{-1} \textbf{x} \geq 0$ for any $\textbf{x}$.

So it was clear that it was a calculation error somewhere, but I couldn’t find the source of the error for a while, until I started doing the inverse calculation myself using SVD to explicitly look at the eigenvalues and make sure they were all > 0. Turns out that there were some very, very small eigenvalues with a negative sign, like -1e-7 kind of range. My guess (and hope) is that these are just rounding errors, but they’re enough to mess up the calculations of the normalized learning rate. So, to handle this I explicitly set those values to zero, and that took care of that.

        S, U, V = tf.svd(F)
atol = tf.reduce_max(S) * 1e-6
S_inv = tf.divide(1.0, S)
S_inv = tf.where(S < atol, tf.zeros_like(S), S_inv)
S_inv = tf.diag(S_inv)
F_inv = tf.matmul(S_inv, tf.transpose(U))
F_inv = tf.matmul(V, F_inv)


Manually adding an update to the learning parameters

Once the parameter update is explicitly calculated, we then need to update them. To implement an explicit parameter update in TensorFlow we do

        # update trainable parameters


So to update the parameters then we call sess.run(my_variables, feed_dict={...}). It’s worth noting here too that any time you run the session with my_variables the parameter update will be calculated and applied, so you can’t run the session with my_variables to only fetch the current values.

Results on the CartPole problem

Now running the original policy gradient algorithm against the natural policy gradient algorithm (with everything else the same) we can examine the results of using the Fisher information matrix in the update provides some strong benefits. The plot below shows the maximum reward received in a batch of 200 time steps, where the system receives a reward of 1 for every time step that the pole stays upright, and 200 is the maximum reward achievable.

To generate this plot I ran 10 sessions of 300 batches, where each batch runs as many episodes as it takes to get 200 time steps of data. The solid lines are the mean value at each epoch across all sessions, and the shaded areas are the 95% confidence intervals. So we can see that the natural policy gradient starts to hit a reward of 200 inside 100 batches, and that the mean stays higher than normal policy gradient even after 300 batches.

It’s also worth noting that for both the policy gradient and natural policy gradient the average time to run a batch and weight update was about the same on my machine, between 150-180 milliseconds.

The modified code for policy gradient, the natural policy gradient, and plotting code are all up on my GitHub.

Other notes

• Had to reduce the $\delta$ to 0.001 (from 0.05 which was used in the Rajeswaran paper) to get the learning to converge, suspect this is because of the greatly reduced complexity of the space and control problem.
• The Rajeswaran paper algorithm does gradient ascent instead of descent, which is why the signs are how they are.

## Building a spiking neural model of adaptive arm control

About a year ago I published the work from my thesis in a paper called ‘A spiking neural model of adaptive arm control’. In this paper I presented the Recurrent Error-driven Adaptive Control Hierarchy (REACH) model. The goal of the model is to begin working towards reproducing behavioural level phenomena of human movement with biologically plausible spiking neural networks.

To do this, I start by using three methods from control literature (operational space control, dynamic movement primitives, and non-linear adaptive control) to create an algorithms level model of the motor control system that captures behavioural level phenomena of human movement. Then I explore how this functionality could be mapped to the primate brain and implemented in spiking neurons. Finally, I look at the data generated by this model on the behavioural level (e.g. kinematics of movement), the systems level (i.e. analysis of populations of neurons), and the single-cell level (e.g. correlating neural activity with movement parameters) and compare/contrast with experimental data.

By having a full model framework (from observable behaviour to neural spikes) is to have a more constrained computational model of the motor control system; adding lower-level biological constraints to behavioural models and higher-level behavioural constraints to neural models.

In general, once you have a model, the critical next step is to generating testable predictions that can be used to discriminate between other models with different implementations or underlying algorithms. Discriminative predictions allow us to design experiments that can gather evidence in favour or against different hypotheses of brain function, and provide clues to useful directions for further research. Which is the real contribution of computational modeling.

So that’s a quick overview of the paper; there are quite a few pages of supplementary information that describe the details of the model implementation, and I provided the code and data used to generate the data analysis figures. However, code to explicitly run the model on your own has been missing. As one of the major points of this blog is to provide code for furthering research, this is pretty embarrassing. So, to begin to remedy this, in this post I’m going to work through a REACH framework for building models to control a two-link arm through reaching in a line, tracing a circle, performing the centre-out reaching task, and adapting online to unexpected perturbations during reaching imposed by a joint-velocity based forcefield.

This post is directed towards those who have already read the paper (although not necessarily the supplementary material). To run the scripts you’ll need Nengo, Nengo GUI, and NengoLib all installed. There’s a description of the theory behind the Neural Engineering Framework, which I use extensively in my Nengo modeling, in the paper. I’m hoping that between that and code readability / my explanations below that most will be comfortable starting to play around with the code. But if you’re not, and would like more resources, you can check out the Getting Started page on the Nengo website, the tutorials from How To Build a Brain, and the examples in the Nengo GUI.

You can find all of the code up on my GitHub.

NOTE: I’m using the two-link arm (which is fully implemented in Python) instead of the three-link arm (which has compile issues for Macs) both so that everyone should be able to run the model arm code and to reduce the number of neurons that are required for control, so that hopefully you can run this on you laptop in the event that you don’t have a super power Ubuntu work station. Scaling this model up to the three-link arm is straight-forward though, and I will work on getting code up (for the three-link arm for non-Mac users) as a next project.

Implementing PMC – the trajectory generation system

I’ve talked at length about dynamic movement primitives (DMPs) in previous posts, so I won’t describe those again here. Instead I will focus on their implementation in neurons.

def generate(y_des, speed=1, alpha=1000.0):
beta = alpha / 4.0

# generate the forcing function
forces, _, goals = forcing_functions.generate(
y_des=y_des, rhythmic=False, alpha=alpha, beta=beta)

# create alpha synapse, which has point attractor dynamics
tau = np.sqrt(1.0 / (alpha*beta))
alpha_synapse = nengolib.Alpha(tau)

net = nengo.Network('PMC')
with net:
net.output = nengo.Node(size_in=2)

# create a start / stop movement signal
time_func = lambda t: min(max(
(t * speed) % 4.5 - 2.5, -1), 1)

def goal_func(t):
t = time_func(t)
if t <= -1:
return goals[0]
return goals[1]
net.goal = nengo.Node(output=goal_func, label='goal')

# -------------------- Ramp ---------------------------
ramp_node = nengo.Node(output=time_func, label='ramp')
net.ramp = nengo.Ensemble(
n_neurons=500, dimensions=1, label='ramp ens')
nengo.Connection(ramp_node, net.ramp)

# ------------------- Forcing Functions ---------------
def relay_func(t, x):
t = time_func(t)
if t <= -1:
return [0, 0]
return x
# the relay prevents forces from being sent on reset
relay = nengo.Node(output=relay_func, size_in=2)

domain = np.linspace(-1, 1, len(forces[0]))
x_func = interpolate.interp1d(domain, forces[0])
y_func = interpolate.interp1d(domain, forces[1])
transform=1.0/alpha/beta
nengo.Connection(net.ramp, relay[0], transform=transform,
function=x_func, synapse=alpha_synapse)
nengo.Connection(net.ramp, relay[1], transform=transform,
function=y_func, synapse=alpha_synapse)
nengo.Connection(relay, net.output)

nengo.Connection(net.goal[0], net.output[0],
synapse=alpha_synapse)
nengo.Connection(net.goal[1], net.output[1],
synapse=alpha_synapse)
return net


The generate method for the PMC takes in a desired trajectory, y_des, as a parameter. The first thing we do (on lines 5-6) is calculate the forcing function that will push the DMP point attractor along the desired trajectory.

The next thing (on lines 9-10) is creating an Alpha (second-order low-pass filter) synapse. By writing out the dynamics of a point attractor in Laplace space, one of the lab members, Aaron Voelker, noticed that the dynamics could be fully implemented by creating an Alpha synapse with an appropriate choice of tau. I walk through all of the math behind this in this post. Here we’ll use that more compact method and project directly to the output node, which improves performance and reduces the number of neurons.

Inside the PMC network we create a time_func node, which is the pace-setter during simulation. It will output a linear ramp from -1 to 1 every few seconds, with the pace set by the speed parameter, and then pause before restarting.

We also have a goal node, which will provide a target starting and ending point for the trajectory. Both the time_func and goal nodes are created and used as a model simulation convenience, and proposed to be generated elsewhere in the brain (the basal ganglia, why not? #igotreasons #provemewrong).

The ramp ensemble is the most important component of the trajectory generation system. It takes the output from the time_func node as input, and generates the forcing function which will guide our little system through the trajectory that was passed in. The ensemble itself is nothing special, but rather the function that it approximates on its outgoing connection. We set up this function approximation with the following code:

        domain = np.linspace(-1, 1, len(forces[0]))
x_func = interpolate.interp1d(domain, forces[0])
y_func = interpolate.interp1d(domain, forces[1])
transform=1.0/alpha/beta
nengo.Connection(net.ramp, relay[0], transform=transform,
function=x_func, synapse=alpha_synapse)
nengo.Connection(net.ramp, relay[1], transform=transform,
function=y_func, synapse=alpha_synapse)
nengo.Connection(relay, net.output)


We want the forcing function be generated as the signals represented in the ramp ensemble moves from -1 to 1. To achieve this, we create interpolation functions, x_func and y_func, which are set up to generate the forcing function values mapped to input values between -1 and 1. We pass these functions into the outgoing connections from the ramp population (one for x and one for y). So now when the ramp ensemble is representing -1, 0, and 1 the output along the two connections will be the starting, middle, and ending x and y points of the forcing function trajectory. The transform and synapse are set on each connection with the appropriate gain values and Alpha synapse, respectively, to implement point attractor dynamics.

NOTE: The above DMP implementation can generate a trajectory signal with as many dimensions as you would like, and all that’s required is adding another outgoing Connection projecting from the ramp ensemble.

The last thing in the code is hooking up the goal node to the output, which completes the point attractor implementation.

Implementing M1 – the kinematics of operational space control

In REACH, we’ve modelled the primary motor cortex (M1) as responsible for taking in a desired hand movement (i.e. target_position - current_hand_position) and calculating a set of joint torques to carry that out. Explicitly, M1 generates the kinematics component of an OSC signal:

$\textbf{u}_\textrm{M1} = \textbf{J}^T \textbf{M}_\textbf{x} (k_p (\textbf{x}^* - \textbf{x}))$

In the paper I did this using several populations of neurons, one to calculate the Jacobian, and then an EnsembleArray to perform the multiplication for the dot product of each dimension separately. Since then I’ve had the chance to rework things and it’s now done entirely in one ensemble.

Now, the set up for the M1 model that computes the above function is to have a single ensemble of neurons that takes in the joint angles, $\textbf{q}$, and control signal $\textbf{u}_\textbf{x} = k_p (\textbf{x}^* - \textbf{x})$, and outputs the function above. Sounds pretty simple, right? Simple is good.

Let’s look at the code (where I’ve stripped out the comments and some debugging code):

def generate(arm, kp=1, operational_space=True,
inertia_compensation=True, means=None, scales=None):

dim = arm.DOF + 2

means = np.zeros(dim) if means is None else means
scales = np.ones(dim) if scales is None else scales
scale_down, scale_up = generate_scaling_functions(
np.asarray(means), np.asarray(scales))

net = nengo.Network('M1')
with net:
# create / connect up M1 ------------------------------
net.M1 = nengo.Ensemble(
n_neurons=1000, dimensions=dim,
intercepts=AreaIntercepts(
dimensions=dim,
base=nengo.dists.Uniform(-1, .1)))

# expecting input in form [q, x_des]
net.input = nengo.Node(output=scale_down, size_in=dim)
net.output = nengo.Node(size_in=arm.DOF)

def M1_func(x, operational_space, inertia_compensation):
""" calculate the kinematics of the OSC signal """
x = scale_up(x)
q = x[:arm.DOF]
x_des = kp * x[arm.DOF:]

# calculate hand (dx, dy)
if operational_space:
J = arm.J(q=q)

if inertia_compensation:
# account for inertia
Mx = arm.Mx(q=q)
x_des = np.dot(Mx, x_des)
# transform to joint torques
u = np.dot(J.T, x_des)
else:
u = x_des

if inertia_compensation:
# account for mass
M = arm.M(q=q)
u = np.dot(M, x_des)

return u

# send in system feedback and target information
# don't account for synapses twice
nengo.Connection(net.input, net.M1, synapse=None)
nengo.Connection(
net.M1, net.output,
function=lambda x: M1_func(
x, operational_space, inertia_compensation),
synapse=None)

return net


The ensemble of neurons itself is created with a few interesting parameters:

        net.M1 = nengo.Ensemble(
n_neurons=1000, dimensions=dim,
intercepts=AreaIntercepts(
dimensions=dim, base=nengo.dists.Uniform(-1, .1)))


Specifically, the radius and intercepts parameters.

Setting the intercepts

First we’ll discuss setting the intercepts using the AreaIntercepts distribution. The intercepts of a neuron determine how much of state space a neuron is active in, which we’ll refer to as the ‘active proportion’. If you don’t know what kind of functions you want to approximate with your neurons, then you having the active proportions for your ensemble chosen from a uniform distribution is a good starting point. This means that you’ll have roughly the same number of neurons active across all of state space as you do neurons that are active over half of state space as you do neurons that are active over very small portions of state space.

By default, Nengo sets the intercepts such that the distribution of active proportions is uniform for lower dimensional spaces. But when you start moving into higher dimensional spaces (into a hypersphere) the default method breaks down and you get mostly neurons that are either active for all of state space or almost none of state space. The AreaIntercepts class calculates how the intercepts should be set to achieve the desire active proportion inside a hypersphere. There are a lot more details here that you can read up on in this IPython notebook by Dr. Terrence C. Stewart.

What you need to know right now is that we’re setting the intercepts of the neurons such that the percentage of state space for which any given neuron is active is chosen from a uniform distribution between 0% and 55%. In other words, a neuron will maximally be active in 55% of state space, no more. This will let us model more nonlinear functions (such as the kinematics of the OSC signal) with fewer neurons. If this description is clear as mud, I really recommend checking out the IPython notebook linked above to get an intuition for what I’m talking about.

Scaling the input signal

The other parameter we set on the M1 ensemble is the radius. The radius scales the range of input values that the ensemble can represent, which is by default anything inside the unit hypersphere (i.e. hypersphere with radius=1). If the radius is left at this default value, the neural activity will saturate for vectors with magnitude greater than 1, leading to inaccurate vector representation and function approximation for input vectors with magnitude > 1. For lower dimensions this isn’t a terrible problem, but as the dimensions of the state space you’re representing grow it becomes more common for input vectors to have a norm greater than 1. Typically, we’d like to be able to, at a minimum, represent vectors with any number of dimensions where any element can be anywhere between -1 and 1. To do this, all we have to do is calculate the norm of the unit vector size dim, which is np.sqrt(dim) (the magnitude of a vector size dim with all elements set to one).

Now that we’re able to represent vectors where the input values are all between -1 and 1, the last part of this sub-network is scaling the input to be between -1 and 1. We use two scaling functions, scale_down and scale_up. The scale_down function subtracts a mean value and scales the input signal to be between -1 and 1. The scale_up function reverts the vector back to it’s original values so that calculations can be carried out normally on the decoding. In choosing mean and scaling values, there are two ways we can set these functions up:

1. Set them generally, based on the upper and lower bounds of the input signal. For M1, the input is $[\textbf{q}, \textbf{u}_\textbf{x}]$ where $\textbf{u}_\textbf{x}$ is the control signal in end-effector space, we know that the joint angles are always in the range 0 to $\pi$ (because that’s how the arm simulation is programmed), so we can set the means and scales to be $\frac{\pi}{2}$ for $\textbf{q}$. For $\textbf{u}$ a mean of zero is reasonable, and we can choose (arbitrarily, empirically, or analytically) the largest task space control signal we want to represent accurately.
2. Or, if we know the model will be performing a specific task, we can look at the range of input values encountered during that task and set the means and scales terms appropriately. For the task of reaching in a straight line, the arm moves in a very limited state space and we can set the mean and we can tune these parameter to be very specific:
                             means=[0.6, 2.2, 0, 0],
scales=[.25, .25, .25, .25]


The benefit of the second method, although one can argue it’s parameter tuning and makes things less biologically plausible, is that it lets us run simulations with fewer neurons. The first method works for all of state space, given enough neurons, but seeing as we don’t always want to be simulating 100k+ neurons we’re using the second method here. By tuning the scaling functions more specifically we’re able to run our model using 1k neurons (and could probably get away with fewer). It’s important to keep in mind though that if the arm moves outside the expected range the control will become unstable.

Implementing CB – the dynamics of operational space control

The cerebellum (CB) sub-network has two components to it: dynamics compensation and dynamics adaptation. First we’ll discuss the dynamics compensation. By which I mean the $k_v \textbf{M} (\textbf{q}^* - \textbf{q})$ term from the OSC signal.

Much like the calculating the kinematics term of the OSC signal in M1, we can calculate the entire dynamics compensation term using a single ensemble with an appropriate radius, scaled inputs, and well chosen intercepts.

def generate(arm, kv=1, learning_rate=None, learned_weights=None,
means=None, scales=None):
dim = arm.DOF * 2

means = np.zeros(dim) if means is None else means
scales = np.ones(dim) if scales is None else scales
scale_down, scale_up = generate_scaling_functions(
np.asarray(means), np.asarray(scales))

net = nengo.Network('CB')
with net:
# create / connect up CB --------------------------------
net.CB = nengo.Ensemble(
n_neurons=1000, dimensions=dim,
intercepts=AreaIntercepts(
dimensions=dim,
base=nengo.dists.Uniform(-1, .1)))
# expecting input in form [q, dq, u]
net.input = nengo.Node(output=scale_down,
size_in=dim+arm.DOF+2)
cb_input = nengo.Node(size_in=dim, label='CB input')
nengo.Connection(net.input[:dim], cb_input)
net.output = nengo.Node(size_in=arm.DOF*2)

def CB_func(x):
""" calculate the dynamic component of OSC signal """
x = scale_up(x)
q = x[:arm.DOF]
dq = x[arm.DOF:arm.DOF*2]

# calculate inertia matrix
M = arm.M(q=q)
return -np.dot(M, kv * dq)
# connect up the input and output
nengo.Connection(net.input[:dim], net.CB)
nengo.Connection(net.CB, net.output[:arm.DOF],
function=CB_func, synapse=None)


I don’t think there’s anything noteworthy going on here, most of the relevant details have already been discussed…so we’ll move on to the adaptation!

Implementing CB – non-linear dynamics adaptation

The final part of the model is the non-linear dynamics adaptation, modelled as a separate ensemble in the cerebellar sub-network (a separate ensemble so that it’s more modular, the learning connection could also come off of the other CB population). I work through the details and proof of the learning rule in the paper, so I’m not going to discuss that here. But I will restate the learning rule here:

$\dot{\textbf{d}} = \textbf{L}_d \textbf{A} \otimes \textbf{u},$

where $\textbf{d}$ are the neural decoders, $\textbf{L}_d$ is the learning rate, $\textbf{A}$ is the neural activity of the ensemble, and $\textbf{u}$ is the joint space control signal sent to the arm. This is a basic delta learning rule, where the decoders of the active neurons are modified to push the decoded function in a direction that reduces the error.

The adaptive ensemble can be initialized either using saved weights (passed in with the learned_weights paramater) or as all zeros. It is important to note that setting decoders to all zeros means that does not mean having zero neural activity, so learning will not be affected by this initialization.

        # dynamics adaptation------------------------------------
if learning_rate is not None:
n_neurons=1000, dimensions=arm.DOF*2,
# enforce spiking neurons
neuron_type=nengo.LIF(),
intercepts=AreaIntercepts(
dimensions=arm.DOF,
base=nengo.dists.Uniform(-.5, .2)))

net.learn_encoders = nengo.Connection(

# if no saved weights were passed in start from zero
weights = (
learned_weights if learned_weights is not None
net.learn_conn = nengo.Connection(
# connect directly to arm so that adaptive signal
# is not included in the training signal
transform=weights,
learning_rule_type=nengo.PES(
learning_rate=learning_rate),
synapse=None)

nengo.Connection(net.input[dim:dim+2],
net.learn_conn.learning_rule,
transform=-1, synapse=.01)
return net


We’re able to implement that learning rule using Nengo’s prescribed error-sensitivity (PES) learning on our connection from the adaptive ensemble to the output. With this set up the system will be able to learn to adapt to perturbations that are functions of the input (set here to be $[\textbf{q}, \dot{\textbf{q}}]$).

The intercepts in this population are set to values I found worked well for adapting to a few different forces, but it’s definitely a parameter to play with in your own scripts if you’re finding that there’s too much or not enough generalization of the decoded function output across the state space.

One other thing to mention is that we need to have a relay node to amalgamate the control signals output from M1 and the dynamics compensation ensemble in the CB. This signal is used to train the adaptive ensemble, and it’s important that the adaptive ensemble’s output is not included in the training signal, or else the system quickly goes off to positive or negative infinity.

Implementing S1 – a placeholder

The last sub-network in the REACH model is a placeholder for a primary sensory cortex (S1) model. This is just a set of ensembles that represents the feedback from the arm and relay it on to the rest of the model.

def generate(arm, direct_mode=False, means=None, scales=None):
dim = arm.DOF*2 + 2  # represents [q, dq, hand_xy]

means = np.zeros(dim) if means is None else means
scales = np.ones(dim) if scales is None else scales
scale_down, scale_up = generate_scaling_functions(
np.asarray(means), np.asarray(scales))

net = nengo.Network('S1')
with net:
# create / connect up S1 --------------------------------
net.S1 = nengo.networks.EnsembleArray(
n_neurons=50, n_ensembles=dim)

# expecting input in form [q, x_des]
net.input = nengo.Node(output=scale_down, size_in=dim)
net.output = nengo.Node(
lambda t, x: scale_up(x), size_in=dim)

# send in system feedback and target information
# don't account for synapses twice
nengo.Connection(net.input, net.S1.input, synapse=None)
nengo.Connection(net.S1.output, net.output, synapse=None)

return net


Since there’s no function that we’re decoding off of the represented variables we can use separate ensembles to represent each dimension with an EnsembleArray. If we were going to decode some function of, for example, q0 and dq0, then we would need an ensemble that represents both variables. But since we’re just decoding out f(x) = x, using an EnsembleArray is a convenient way to decrease the number of neurons needed to accurately represent the input.

Creating a model using the framework

The REACH model has been set up to be as much of a plug and play system as possible. To generate a model you first create the M1, PMC, CB, and S1 networks, and then they’re all hooked up to each other using the framework.py file. Here’s an example script that controls the arm to trace a circle:

def generate():
kp = 200
kv = np.sqrt(kp) * 1.5

center = np.array([0, 1.25])
# set the initial position of the arm
arm_sim.init_q = arm_sim.inv_kinematics(center)
arm_sim.reset()

net = nengo.Network(seed=0)
with net:
net.dim = arm_sim.DOF
net.arm_node = arm_sim.create_nengo_node()
net.error = nengo.Ensemble(1000, 2)
net.xy = nengo.Node(size_in=2)

# create an M1 model -------------------------------------
net.M1 = M1.generate(arm_sim, kp=kp,
operational_space=True,
inertia_compensation=True,
means=[0.6, 2.2, 0, 0],
scales=[.5, .5, .25, .25])

# create an S1 model -------------------------------------
net.S1 = S1.generate(arm_sim,
means=[.6, 2.2, -.5, 0, 0, 1.25],
scales=[.5, .5, 1.7, 1.5, .75, .75])

# subtract current position to get task space direction
nengo.Connection(net.S1.output[net.dim*2:], net.error,
transform=-1)

# create a trajectory for the hand to follow -------------
x = np.linspace(0.0, 2.0*np.pi, 100)
PMC_trajectory = np.vstack([np.cos(x) * .5,
np.sin(x) * .5])
PMC_trajectory += center[:, None]
# create / connect up PMC --------------------------------
net.PMC = PMC.generate(PMC_trajectory, speed=1)
# send target for calculating control signal
nengo.Connection(net.PMC.output, net.error)
# send target (x,y) for plotting
nengo.Connection(net.PMC.output, net.xy)

net.CB = CB.generate(arm_sim, kv=kv,
means=[0.6, 2.2, -.5, 0],
scales=[.5, .5, 1.6, 1.5])

model = framework.generate(net=net, probes_on=True)
return model


In line 50 you can see the call to the framework code, which will hook up the most common connections that don’t vary between the different scripts.

The REACH model has assigned functionality to each area / sub-network, and you can see the expected input / output in the comments at the top of each sub-network file, but the implementations are open. You can create your own M1, PMC, CB, or S1 sub-networks and try them out in the context of a full model that generates high-level movement behaviour.

Running the model

To run the model you’ll need Nengo, Nengo GUI, and NengoLib all installed. You can then pull open Nengo GUI and load any of the a# scripts. In all of these scripts the S1 model is just an ensemble that represents the output from the arm_node. Here’s what each of the scripts does:

1. a01 has a spiking M1 and CB, dynamics adaptation turned off. The model guides the arm in reaching in a straight line to a single target and back.
2. a02 has a spiking M1, PMC, and CB, dynamics adaptation turned off. The PMC generates a path for the hand to follow that traces out a circle.
3. a03 has a spiking M1, PMC, and CB, dynamics adaptation turned off. The PMC generates a path for the joints to follow, which moves the hand in a straight line to a target and back.
4. a04 has a spiking M1 and CB, dynamics adaptation turned off. The model performs the centre-out reaching task, starting at a central point and reaching to 8 points around a circle.
5. a05 has a spiking M1 and CB, and dynamics adaptation turned on. The model performs the centre-out reaching task, starting at a central point and reaching to 8 points around a circle. As the model reaches, a forcefield is applied based on the joint velocities that pushes the arm as it tries to reach the target. After 100-150 seconds of simulation the arm has adapted and learned to reach in a straight line again.

Here’s what it looks like when you pull open a02 in Nengo GUI:

I’m not going to win any awards for arm animation, but! It’s still a useful visualization, and if anyone is proficient in javascript and want’s to improve it, please do! You can see the network architecture in the top left, the spikes generated by M1 and CB to the right of that, the arm in the bottom left, and the path traced out by the hand just to the right of that. On the top right you can see the a02 script code, and below that the Nengo console.

Conclusions

One of the most immediate extensions (aside from any sort of model of S1) that comes to mind here is implementing a more detailed cerebellar model, as there are several anatomically detailed models which have the same supervised learner functionality (for example (Yamazaki and Nagao, 2012)).

Hopefully people find this post and the code useful for their own work, or at least interesting! In the ideal scenario this would be a community project, where researchers add models of different brain areas and we end up with a large library of implementations to build larger models with in a Mr. Potato Head kind of fashion.

You can find all of the code up on my GitHub. And again, this code all should have been publicly available along with the publication. Hopefully the code proves useful! If you have any questions about it please don’t hesitate to make a comment here or contact me through email, and I’ll get back to you as soon as I can.

## Matplotlib legends for mean and confidence interval plots

When plotting means and confidence intervals, sometimes the mean lines are hard to see and it’s nice to have included in your legend the color of the confidence interval shading. It turns out this is a bit of a chore in Matplotlib, but building off of their online examples you can get something that looks pretty alright:

So here’s code for getting the above plot, with an option for solid or dashed lines. The sky is the limit!

import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from matplotlib.colors import colorConverter as cc
import numpy as np

def plot_mean_and_CI(mean, lb, ub, color_mean=None, color_shading=None):
# plot the shaded range of the confidence intervals
plt.fill_between(range(mean.shape[0]), ub, lb,
# plot the mean on top
plt.plot(mean, color_mean)

# generate 3 sets of random means and confidence intervals to plot
mean0 = np.random.random(50)
ub0 = mean0 + np.random.random(50) + .5
lb0 = mean0 - np.random.random(50) - .5

mean1 = np.random.random(50) + 2
ub1 = mean1 + np.random.random(50) + .5
lb1 = mean1 - np.random.random(50) - .5

mean2 = np.random.random(50) -1
ub2 = mean2 + np.random.random(50) + .5
lb2 = mean2 - np.random.random(50) - .5

# plot the data
fig = plt.figure(1, figsize=(7, 2.5))

class LegendObject(object):
def __init__(self, facecolor='red', edgecolor='white', dashed=False):
self.facecolor = facecolor
self.edgecolor = edgecolor
self.dashed = dashed

def legend_artist(self, legend, orig_handle, fontsize, handlebox):
x0, y0 = handlebox.xdescent, handlebox.ydescent
width, height = handlebox.width, handlebox.height
patch = mpatches.Rectangle(
# create a rectangle that is filled with color
[x0, y0], width, height, facecolor=self.facecolor,
# and whose edges are the faded color
edgecolor=self.edgecolor, lw=3)

# if we're creating the legend for a dashed line,
# manually add the dash in to our rectangle
if self.dashed:
patch1 = mpatches.Rectangle(
[x0 + 2*width/5, y0], width/5, height, facecolor=self.edgecolor,
transform=handlebox.get_transform())

return patch

bg = np.array([1, 1, 1])  # background of the legend is white
colors = ['black', 'blue', 'green']
# with alpha = .5, the faded color is the average of the background and color
colors_faded = [(np.array(cc.to_rgb(color)) + bg) / 2.0 for color in colors]

plt.legend([0, 1, 2], ['Data 0', 'Data 1', 'Data 2'],
handler_map={
})

plt.title('Example mean and confidence interval plot')
plt.tight_layout()
plt.grid()
plt.show()


Side note, really enjoying the default formatting in Matplotlib 2+.

## ABR Jaco repo public release!

https://github.com/abr/abr_jaco2

We’ve been working with Kinova’s Jaco$^2$ arm with joint torque sensors for the last year or so as part of our research at Applied Brain Research, and we put together a fun adaptive control demo and got to show it to Justin Trudeau. As you might have guessed based on previous posts, the robotic control used force control. Force control is available on the Jaco$^2$, but the API that comes with the arm has much too slow an update time for practical use for our application (around 100Hz, if I recall correctly).

So part of the work I did with Pawel Jaworski over the last year was to write an interface for force control to the Jaco$^2$ arm that had a faster control loop. Using Kinova’s low level API, we managed to get things going at about 250Hz, which was sufficient for our purposes. In the hopes of saving other people the trouble of having to redo all this work to begin to be able to play around with force control on the Kinova, we’ve made the repo public and free for non-commercial use. It’s by no means fully optimized, but it is well tested and hopefully will be found useful!

The interface was designed to plug into our ABR Control repo, so you’ll also need that installed to get things working. Once both repos are installed, you can either use the controllers in the ABR Control repo or your own. The interface has a few options, which are shown in the following demo script:

import abr_jaco2
from abr_control.controllers import OSC

robot_config = abr_jaco2.Config()
interface = abr_jaco2.Interface(robot_config)
ctrlr = OSC(robot_config)
# instantiate things to avoid creating 200ms delay in main loop
ctrlr.generate(q=zeros, dq=zeros, target=zeros(3))
# run once outside main loop as well, returns the cartesian
# coordinates of the end effector
robot_config.Tx('EE', q=zeros)

interface.connect()
interface.init_position_mode()
interface.send_target_angles(robot_config.INIT_TORQUE_POSITION)

target_xyz = [.57, .03 .87]  # (x, y, z) target (metres)
interface.init_force_mode()

while 1:
# returns a dictionary with q, dq
feedback = interface.get_feedback()
# ee position
xyz = robot_config.Tx('EE', q=q, target_pos = target_xyz)
u = ctrlr.generate(feedback['q'], feedback['dq'], target_xyz)
interface.send_forces(u, dtype='float32')

error = np.sqrt(np.sum((xyz - TARGET_XYZ[ii])**2))

if error < 0.02:
break

# switch back to position mode to move home and disconnect
interface.init_position_mode()
interface.send_target_angles(robot_config.INIT_TORQUE_POSITION)
interface.disconnect()


You can see you have the option for position control, but you can also initiate torque control mode and then start sending forces to the arm motors. To get a full feeling of what is available, we’ve got a bunch of example scripts that show off more of the functionality.

Here are some gifs feature Pawel showing the arm operating under force control. The first just shows compliance of normal operational space control (on the left) and an adaptation example (on the right). In both cases here the arm is moving to and trying to maintain a target location, and Pawel is pushing it away.

You can see that in the adaptive example the arm starts to compensate for the push, and then when Pawel lets go of the arm it overshoots the target because it’s compensating for a force that no longer exists.

So it’s our hope that this will be a useful tool for those with a Kinova Jaco$^2$ arm with torque sensors exploring force control. If you end up using the library and come across places for improvement (there are many), contributions are very appreciated!

Also a big shout out to the Kinova support team that provided many hours of support during development! It’s an unusual use of the arm, and their engineers and support staff were great in getting back to us quickly and with useful advice and insights.

## ABR Control repo public release!

https://github.com/abr/abr_control

Last August I started working on a new version of my old control repo with a resolve to make something less hacky, as part of the work for Applied Brain Research, Inc, a startup that I co-founded with others from my lab after most of my cohort graduated. Together with Pawel Jaworski, who comprises other half of ABR’s motor team, we’ve been building up a library of useful functions for modeling, simulating, interfacing, and controlling robot arms.

Today we’re making the repository public, under the same free for non-commercial use that we’ve released our Nengo neural modelling software on. You can find it here: ABR_Control

It’s all Python 3, and here’s an overview of some of the features:

• Automates generation of functions for computing the Jacobians, joint space and task space inertia matrices, centrifugal and Coriolis effects, and Jacobian derivative, provided each link’s inertia matrix and the transformation matrices
• Option to compile these functions to speedy Cython implementations
• Operational space, joint space, floating, and sliding controllers provided with PyGame and VREP example scripts
• Interfaces with VREP
• Configuration files for one, two, and three link arms, as well as the UR5 and Jaco2 arms in VREP
• Provides Python simulations of two and three link arms, with PyGame visualization
• Path planning using first and second order filtering of the target and example scripts.

Structure

The ABR Control library is divided into three sections:

1. Arm descriptions (and simulations)
2. Robotic system interfaces
3. Controllers

The big goal was to make all of these interchangeable, so that to run any permutation of them you just have to change which arm / interface / controller you’re importing.

To support a new arm, the user only needs to create a configuration file specifying the transforms and inertia matrices. Code for calculating the necessary functions of the arm will be symbolically derived using SymPy, and compiled to C using Cython for efficient run-time execution.

Interfaces provide send_forces and send_target_angles functions, to apply torques and put the arm in a target state, as well as a get_feedback function, which returns a dictionary of information about the current state of the arm (joint angles and velocities at a minimum).

Controllers provide a generate function, which take in current system state information and a target, and return a set of joint torques to apply to the robot arm.

VREP example

The easiest way to show it is with some code examples. So, once you’ve cloned and installed the repo, you can open up VREP and the jaco2.ttt model in the abr_control/arms/jaco2 folder, and to control it using an operational space controller you would run the following:

import numpy as np
from abr_control.arms import jaco2 as arm
from abr_control.controllers import OSC
from abr_control.interfaces import VREP

# initialize our robot config for the ur5
robot_config = arm.Config(use_cython=True, hand_attached=True)

# instantiate controller
ctrlr = OSC(robot_config, kp=200, vmax=0.5)

# create our VREP interface
interface = VREP(robot_config, dt=.001)
interface.connect()

target_xyz = np.array([0.2, 0.2, 0.2])
# set the target object's position in VREP
interface.set_xyz(name='target', xyz=target_xyz)

count = 0.0
while count < 1500:  # run for 1.5 simulated seconds
# get joint angle and velocity feedback
feedback = interface.get_feedback()
# calculate the control signal
u = ctrlr.generate(
q=feedback['q'],
dq=feedback['dq'],
target_pos=target_xyz)
# send forces into VREP, step the sim forward
interface.send_forces(u)

count += 1
interface.disconnect()


This is a minimal example of the examples/VREP/reaching.py code. To run it with a different arm, you can just change the from abr_control.arms import as line. The repo comes with the configuration files for the UR5 and a onelink VREP arm model as well.

PyGame example

I’ve also found the PyGame simulations of the 2 and 3 link arms very helpful for quickly testing new controllers and code, as an easy low overhead proof of concept sandbox. To run the threelink arm (which runs in Linux and Windows fine but I’ve heard has issues in Mac OS), with the operational space controller, you can run this script:

import numpy as np
from abr_control.arms import threelink as arm
from abr_control.interfaces import PyGame
from abr_control.controllers import OSC

# initialize our robot config
robot_config = arm.Config(use_cython=True)
# create our arm simulation
arm_sim = arm.ArmSim(robot_config)

# create an operational space controller
ctrlr = OSC(robot_config, kp=300, vmax=100,
use_dJ=False, use_C=True)

def on_click(self, mouse_x, mouse_y):
self.target[0] = self.mouse_x
self.target[1] = self.mouse_y

# create our interface
interface = PyGame(robot_config, arm_sim, dt=.001,
on_click=on_click)
interface.connect()

# create a target
feedback = interface.get_feedback()
target_xyz = robot_config.Tx('EE', feedback['q'])
interface.set_target(target_xyz)

try:
while 1:
# get arm feedback
feedback = interface.get_feedback()
hand_xyz = robot_config.Tx('EE', feedback['q'])

# generate an operational space control signal
u = ctrlr.generate(
q=feedback['q'],
dq=feedback['dq'],
target_pos=target_xyz)

new_target = interface.get_mousexy()
if new_target is not None:
target_xyz[0:2] = new_target
interface.set_target(target_xyz)

# apply the control signal, step the sim forward
interface.send_forces(u)

finally:
# stop and reset the simulation
interface.disconnect()


The extra bits of code just set up a hook so that when you click on the PyGame display somewhere the target moves to that point.

So! Hopefully some people find this useful for their research! It should be as easy to set up as cloning the repo, installing the requirements and running the setup file, and then trying out the examples.

If you find a bug please file an issue! If you find a way to improve it please do so and make a PR! And if you’d like to use anything in the repo commercially, please contact me.

Tagged , , , , ,

## Improving neural models by compensating for discrete rather than continuous filter dynamics when simulating on digital systems

This is going to be a pretty niche post, but there is some great work by Aaron Voelker from my old lab that has inspired me to do a post. The work is from an upcoming paper, which is all up on Aaron’s GitHub. It applies to building neural models using the Neural Engineering Framework (NEF). There’s a bunch of material on the NEF out there already, (e.g. the book How to Build a Brain by Dr. Chris Eliasmith, an online intro, and you can also check out Nengo, which is neural model development software with some good tutorials on the NEF) so I’m going to assume you already know the basics of the NEF for this post.

Additionally, this is applicable to simulating these models on digital systems, which, probably, most of you are. If you’re not, however! Then use standard NEF methods.

And then last note before starting, these methods are most relevant for systems with fast dynamics (relative to simulation time). If your system dynamics are pretty slow, you can likely get away with the continuous time solution if you resist change and learning. And we’ll see this in the example point attractor system at the end of the post! But even for slowly evolving systems, I would still recommend at least skipping to the end and seeing how to use the library shortcuts when coding your own models. The example code is also all up on my GitHub.

NEF modeling with continuous lowpass filter dynamics

Basic state space equations for linear time-invariant (LTI) systems (i.e. dynamics can be captured with a matrix and the matrices don’t change over time) are:

$\dot{\textbf{x}}(t) = \textbf{A}\textbf{x}(t) + \textbf{B}\textbf{u}(t)$

$\textbf{y}(t) = \textbf{C}\textbf{x}(t) + \textbf{D}\textbf{u}(t)$

where

• $\textbf{x}$ is the system state,
• $\textbf{y}$ is the system output,
• $\textbf{u}$ is the system input,
• $\textbf{A}$ is called the state matrix,
• $\textbf{B}$ is called the input matrix,
• $\textbf{C}$ is called the output matrix, and
• $\textbf{D}$ is called the feedthrough matrix,

and the system diagram looks like this:

and the transfer function, which is written in Laplace space and captures the system output over system input, for the system is

$\textbf{F}(s) = \frac{\textbf{Y}(s)}{\textbf{U}(s)} = \textbf{C}(s\textbf{I} - \textbf{A})^{-1} \textbf{B} + \textbf{D}$

where $s$ is the Laplace variable.

Now, because it’s a neural system we don’t have a perfect integrator in the middle, we instead have a synaptic filter, $H(s)$, giving:

So our goal is: given some synaptic filter $H(s)$, we want to generate some modified transfer function, $\textbf{F}'$, such that $\textbf{F}'(H(s))$ has the same dynamics as our desired system, $\textbf{F}(s)$. In other words, find an $\textbf{F}'$ such that

$\textbf{F}'\left(\frac{1}{H(s)}\right) = \textbf{F}(s).$

Alrighty. Let’s do that.

The transfer function for our neural system is

$\textbf{F}(H(s)) = \frac{\textbf{Y}(s)}{\textbf{U}(s)} = \textbf{C}(H(s)^{-1}\textbf{I} - \textbf{A})^{-1} \textbf{B} + \textbf{D}.$

The effect of the synapse is well captured by a lowpass filter, $H(s) = \frac{1}{\tau s + 1}$, making our equation

$\textbf{F}(H(s)) = \frac{\textbf{Y}(s)}{\textbf{U}(s)} = \textbf{C}((\tau s + 1)\textbf{I} - \textbf{A})^{-1} \textbf{B} + \textbf{D},$

$\textbf{F}(H(s)) = \frac{\textbf{Y}(s)}{\textbf{U}(s)} = \textbf{C}(\tau s \textbf{I} + \textbf{I} - \textbf{A})^{-1} \textbf{B} + \textbf{D}.$

To get this into a form where we can start to modify the system state matrices to compensate for the filter effects, we have to isolate $s\textbf{I}$. To do that, we can do the following basic math jujitsu

$\textbf{F}(H(s)) = \frac{\textbf{Y}(s)}{\textbf{U}(s)} = \textbf{C}(\tau (s \textbf{I} + \frac{1}{\tau}(\textbf{I} - \textbf{A}))^{-1} \textbf{B} + \textbf{D}.$

$\textbf{F}(H(s)) = \frac{\textbf{Y}(s)}{\textbf{U}(s)} = \textbf{C}(s \textbf{I} + \frac{1}{\tau}(\textbf{I} - \textbf{A})^{-1} \frac{1}{\tau}\textbf{B} + \textbf{D}.$

OK. Now, we can make $\textbf{F}'$ by substituting for our $\textbf{A}$ and $\textbf{B}$ matrices with

$\textbf{A}' = \tau\textbf{A} + \textbf{I}$

$\textbf{B}' = \tau\textbf{B}$

then we get

$\textbf{F}'(H(s)) = \frac{\textbf{Y}(s)}{\textbf{U}(s)} = \textbf{C}(s \textbf{I} + \frac{1}{\tau}(\textbf{I} - \textbf{A}')^{-1} \frac{1}{\tau}\textbf{B}' + \textbf{D}.$

$= \textbf{C}(s \textbf{I} + \frac{1}{\tau}(\textbf{I} - (\tau\textbf{A} + \textbf{I}))^{-1} \frac{1}{\tau}(\tau\textbf{B}) + \textbf{D}.$

$= \textbf{C}(s \textbf{I} + \frac{1}{\tau}(\tau\textbf{A}))^{-1}\textbf{B} + \textbf{D}.$

$= \textbf{C}(s \textbf{I} + \textbf{A})^{-1}\textbf{B} + \textbf{D}.$

and voila! We have created an $\textbf{F}'$ such that $\textbf{F}'(H(s)) = \textbf{F}(s)$. Said another way, we have created a system $(\textbf{A}', \textbf{B}', \textbf{C}'=\textbf{C}, \textbf{D}'=\textbf{D})$ that compensates for the synaptic filtering effects to achieve our desired system dynamics!

So, to compensate for the continuous lowpass filter, we use $\textbf{A}' = \tau \textbf{A} + \textbf{I}$ and $\textbf{B}' = \tau \textbf{B}$ when implementing our model and we’re golden.

And so that’s what we’ve been doing for a long time when building our models. Assuming a continuous lowpass filter and going along our merry way. Aaron, however, shrewdly noticed that computers are digital, and thusly that the standard NEF methods are not a fully accurate way of compensating for the filter that is actually being applied in simulation.

To convert our continuous system state equations to discrete state equations we need to make two changes: 1) the first is a variable change to denote the that we’re in discrete time, and we’ll use $z$ instead of $s$, and 2) we need to calculate the discrete version our system, i.e. transform $(\textbf{A}, \textbf{B}, \textbf{C}, \textbf{D}) \rightarrow (\textbf{A}_d, \textbf{B}_d, \textbf{C}_d, \textbf{D}_d)$.

The first step is easy, the second step more complicated. To discretize the system we’ll use the zero-order hold (ZOH) method (also referred to as discretization assuming zero-order hold).

Zero-order hold discretization

Zero-order hold (ZOH) systems simply hold their input over a specified amount of time. The use of ZOH here is that during discretization we assume the input control signal stays constant until the next sample time.

There are good write ups on the derivation of the discretization both on wikipedia and in these course notes from Purdue. I mostly followed the wikipedia derivation, but there were a few steps that get glossed over, so I thought I’d just write it out fully here and hopefully save someone some pain. Also for just a general intro I found these slides from Paul Oh at Drexel University really helpful.

OK. First we’ll solve an LTI system, and then we’ll discretize it.

So, you’ve got yourself a continuous LTI system

$\dot{\textbf{x}}(t) = \textbf{A}\textbf{x}(t) + \textbf{B}\textbf{u}(t)$

and you want to solve for $\textbf{x}(t)$. Rearranging things to put all the $\textbf{x}$ on one side gives

$\dot{\textbf{x}}(t) - \textbf{A}\textbf{x}(t) = \textbf{B}\textbf{u}(t).$

Looking through our identity library to find something that might help us here (after a long and grueling search) we come across:

$\frac{\partial}{\partial t} \textrm{e}^{\textbf{A}t} = \textbf{A} \textrm{e}^{\textbf{A}t} = \textrm{e}^{\textbf{A}t} \textbf{A}.$

We now left multiply our system by $\textrm{e}^{-\textbf{A}t}$ (note the negative in the exponent)

$\textrm{e}^{-\textbf{A}t}\dot{\textbf{x}}(t) - \textrm{e}^{-\textbf{A}t}\textbf{A}\textbf{x}(t) = \textrm{e}^{-\textbf{A}t}\textbf{B}\textbf{u}(t).$

Looking at this carefully, we identify the left-hand side as the result of a chain rule, so we can rewrite it as

$\frac{\partial}{\partial t} (\textrm{e}^{-\textbf{A}t}\textbf{x}(t)) = \textrm{e}^{-\textbf{A}t}\textbf{B}\textbf{u}(t).$

From here we integrate both sides, giving

$\textrm{e}^{-\textbf{A}t}\textbf{x}(t) - \textrm{e}^0\textbf{x}(0) = \int_0^t\textrm{e}^{-\textbf{A}\tau}\textbf{B}\textbf{u}(\tau) d \tau,$

$\textrm{e}^{-\textbf{A}t}\textbf{x}(t) = \int_0^t\textrm{e}^{-\textbf{A}\tau}\textbf{B}\textbf{u}(\tau) d \tau + \textbf{x}(0).$

To isolate the $\textbf{x}(t)$ term on the left-hand side now multiply by $\textrm{e}^{\textbf{A}t}$:

$\textrm{e}^{\textbf{A}t}\textrm{e}^{-\textbf{A}t}\textbf{x}(t) = \textrm{e}^{\textbf{A}t}\int_0^t\textrm{e}^{-\textbf{A}\tau}\textbf{B}\textbf{u}(\tau) d \tau + \textrm{e}^{\textbf{A}t}\textbf{x}(0),$

$\textrm{e}^{\textbf{A}t-\textbf{A}t}\textbf{x}(t) = \textrm{e}^{\textbf{A}t}\int_0^t\textrm{e}^{-\textbf{A}\tau}\textbf{B}\textbf{u}(\tau) d \tau + \textrm{e}^{\textbf{A}t}\textbf{x}(0),$

$\textbf{x}(t) = \textrm{e}^{\textbf{A}t}\int_0^t\textrm{e}^{-\textbf{A}\tau}\textbf{B}\textbf{u}(\tau) d \tau + \textrm{e}^{\textbf{A}t}\textbf{x}(0).$

OK! We solved for $\textbf{x}(t)$.

To discretize our solution we’re going to assume that we’re sampling the system at even intervals, i.e. each sample is at $kT$ for some time step $T$, and that the input $\textbf{u}(t)$ is constant between samples (this is where the ZOH comes in). To simplify our notation as we go, we also define

$\textbf{x}[k] = \textbf{x}(kT).$

So using our new notation, we have

$\textbf{x}[k] = \textrm{e}^{\textbf{A}kT}\textbf{x}(0) + \textrm{e}^{\textbf{A}kT}\int_0^{kT} \textrm{e}^{-\textbf{A}\tau}\textbf{Bu}(\tau) d\tau.$

Now we want to get things back into the form:

$\textbf{x}[k+1] = \textbf{A}_d\textbf{x}[k] + \textbf{B}_d\textbf{u}[k].$

To start, let’s write out the equation for $\textbf{x}[k + 1]$

$\textbf{x}[k+1] = \textrm{e}^{\textbf{A}(k+1)T}\textbf{x}(0) + \textrm{e}^{\textbf{A}(k+1)T}\int_0^{(k+1)T} \textrm{e}^{-\textbf{A}\tau}\textbf{Bu}(\tau) d\tau.$

We want to relate $\textbf{x}[k+1]$ to $\textbf{x}[k]$. Being incredibly clever, we see that we can left multiply $\textbf{x}[k]$ by $\textrm{e}^{\textbf{A}T}$, to get

$\textrm{e}^{\textbf{A}T}\textbf{x}[k] = \textrm{e}^{\textbf{A}(k+1)T}\textbf{x}(0) + \textrm{e}^{\textbf{A}(k+1)T}\int_0^{kT} \textrm{e}^{-\textbf{A}\tau}\textbf{Bu}(\tau) d\tau,$

and can rearrange to solve for a term we saw in $\textbf{x}[k+1]$:

$\textrm{e}^{\textbf{A}(k+1)T}\textbf{x}(0) = \textrm{e}^{\textbf{A}T}\textbf{x}[k] - \textrm{e}^{\textbf{A}(k+1)T}\int_0^{kT} \textrm{e}^{-\textbf{A}\tau}\textbf{Bu}(\tau) d\tau.$

Plugging this in, we get

$\textbf{x}[k+1] = \textrm{e}^{\textbf{A}T}\textbf{x}[k] - \textrm{e}^{\textbf{A}(k+1)T}(\int_0^{kT} \textrm{e}^{-\textbf{A}\tau}\textbf{Bu}(\tau) d\tau + \int_0^{(k+1)T} \textrm{e}^{-\textbf{A}\tau}\textbf{Bu}(\tau) d\tau),$

$\textbf{x}[k+1] = \textrm{e}^{\textbf{A}T}\textbf{x}[k] - \textrm{e}^{\textbf{A}(k+1)T}\int_{kT}^{(k+1)T} \textrm{e}^{-\textbf{A}\tau}\textbf{Bu}(\tau) d\tau.$

OK, we’re getting close.

At this point we’ve got things in the right form, but we can still clean up that second term on the right-hand side quite a bit. First, note that using our starting assumption (that $\textbf{u}(t) \in [k, kT)$ is constant), we can take $\textbf{Bu}(t)$ outside the integral.

$\textbf{x}[k+1] = \textrm{e}^{\textbf{A}T}\textbf{x}[k] - \textrm{e}^{\textbf{A}(k+1)T}\int_{kT}^{(k+1)T} \textrm{e}^{-\textbf{A}\tau}d\tau \textbf{Bu}[k].$

Next, bring that $\textrm{e}^{\textbf{A}(k+1)T}$ term inside the integral:

$\textbf{x}[k+1] = \textrm{e}^{\textbf{A}T}\textbf{x}[k] - \int_{kT}^{(k+1)T} \textrm{e}^{\textbf{A}((k+1)T - \tau)}d\tau \textbf{Bu}[k].$

And now we’re going to simplify the integral using variable substitution. Let $v = (k+1)T - \tau$, which means also that $\frac{dv}{d\tau} = -1 \rightarrow d\tau = -dv$. Evaluating the upper and lower bounds of the integral, when $\tau = (k+1)T$ then $v = 0$ and when $\tau = kT$ then $v = T$. With this, we can rewrite our equation:

$\textbf{x}[k+1] = \textrm{e}^{\textbf{A}T}\textbf{x}[k] - \int_T^0 \textrm{e}^{\textbf{A}v}dv \textbf{Bu}[k].$

The astute will notice our integral integrates from T to 0 instead of 0 to T. Fortunately for us, we know $\int_a^b x = -\int_b^a$. We can just swap the bounds and multiply by $-1$, giving:

$\textbf{x}[k+1] = \textrm{e}^{\textbf{A}T}\textbf{x}[k] + \int_0^T \textrm{e}^{\textbf{A}v}dv \textbf{Bu}[k].$

And finally, we can evaluate our integral by recalling that $\frac{d}{dt}\textrm{e}^{\textbf{A}t} = \textbf{A}\textrm{e}^{\textbf{A}t}$ and assuming that $\textbf{A}$ is invertible:

$\textbf{x}[k+1] = \textrm{e}^{\textbf{A}T}\textbf{x}[k] + \textbf{A}^{-1} \textrm{e}^{\textbf{A}v}|^T_{v=0} \textbf{Bu}[k].$

$\textbf{x}[k+1] = \textrm{e}^{\textbf{A}T}\textbf{x}[k] + \textbf{A}^{-1} (\textrm{e}^{\textbf{A}T} - \textrm{e}^0) \textbf{Bu}[k].$

$\textbf{x}[k+1] = \textrm{e}^{\textbf{A}T}\textbf{x}[k] + \textbf{A}^{-1} (\textrm{e}^{\textbf{A}T} - \textbf{I}) \textbf{Bu}[k].$

We did it! The state and input matrices for our digital system are:

$\textbf{A}_d = \textrm{e}^{\textbf{A}T}$

$\textbf{B}_d = \textbf{A}^{-1} (\textrm{e}^{\textbf{A}T} - \textbf{I}) \textbf{B}$

And that’s the hard part of discretization, the rest of the system is easy:
where, fortunately for us

$\textbf{C}_d = \textbf{C},$

$\textbf{D}_d = \textbf{D}.$

This then gives us our discrete system transfer function:

$\textbf{F}(z) = \frac{\textbf{Y}_d(z)}{\textbf{U}_d(z)} = \textbf{C}_d(z\textbf{I} - \textbf{A}_d)^{-1} \textbf{B}_d + \textbf{D}_d.$

NEF modeling with continuous lowpass filter dynamics

Now that we know how to discretize our system, we can look at compensating for the lowpass filter dynamics in discrete time. The equation for the discrete time lowpass filter is

$H(z) = \frac{1-a}{z-a},$

where $a = \textrm{e}^{-\frac{dt}{\tau}}.$

Plugging that in to the discrete transfer fuction gets us

$\textbf{F}(H(z)) = \textbf{C}_d(H(z)^{-1}\textbf{I} - \textbf{A}_d)^{-1} \textbf{B}_d + \textbf{D}_d.$

$\textbf{F}(H(z)) = \textbf{C}_d(\frac{z-a}{1-a}\textbf{I} - \textbf{A}_d)^{-1} \textbf{B}_d + \textbf{D}_d,$

$\textbf{F}(H(z)) = \textbf{C}_d(\frac{z\textbf{I}}{1-a} - \frac{a\textbf{I} - (1-a)\textbf{A}_d}{1-a})^{-1} \textbf{B}_d + \textbf{D}_d,$

$\textbf{F}(H(z)) = \textbf{C}_d(z\textbf{I} - a\textbf{I} - (1-a)\textbf{A}_d)^{-1} (1-a)\textbf{B}_d + \textbf{D}_d.$

and we see that if we choose

$\textbf{A}'_d = \frac{1}{1-a}(\textbf{A} + a\textbf{I}),$

$\textbf{B}'_d = \frac{1}{1-a}\textbf{B},$

then we get

$\textbf{F}'(H(z)) = \textbf{C}_d(z\textbf{I} - a\textbf{I} - (1-a)\textbf{A}_d')^{-1} (1-a)\textbf{B}_d' + \textbf{D}_d,$

$= \textbf{C}_d(z\textbf{I} - a\textbf{I} - (1-a)(\frac{1}{1-a}(\textbf{A} + a\textbf{I})))^{-1} (1-a)(\frac{1}{1-a}\textbf{B}) + \textbf{D}_d,$

$= \textbf{C}_d(z\textbf{I} - a\textbf{I} - \textbf{A} + a\textbf{I})^{-1} \textbf{B}) + \textbf{D}_d,$

$= \textbf{C}_d(z\textbf{I} - \textbf{A})^{-1} \textbf{B}) + \textbf{D}_d,$

And now congratulations are in order. Proper compensation for the discrete lowpass filter dynamics has finally been achieved!

Point attractor example

What difference does this actually make in modelling? Well, everyone likes examples, so let’s have one.

Here are the dynamics for a second-order point attractor system:

$\ddot{x} = \alpha(\beta(x^* - x) - \dot{x})$

with $x, \dot{x},$ and $\ddot{x}$ being the system position, velocity, and acceleration, respectively, $x^*$ is the target position, and $\alpha,$ and $\beta$ are gain values. So the acceleration is just going to be set such that it drives the system towards the target position, and compensates for velocity.

Converting this from a second order system to a first order system we have

$\left [ \begin{array}{c} \dot{x} \\ \ddot{x} \end{array} \right ] = \left [ \begin{array}{cc}0 & 1 \\ -\alpha\beta & -\alpha \end{array} \right] \left [ \begin{array}{c} x \\ \dot{x} \end{array} \right ] + \left [ \begin{array}{c}0 \\ \alpha\beta \end{array} \right] \left [ \begin{array}{c} 0 \\ x^* \end{array} \right ]$

which we’ll rewrite compactly as

$\dot{\textbf{x}} = \textbf{A} \textbf{x} + \textbf{B} \textbf{u}$

OK, we’ve got our state space equation of the dynamical system we want to implement.

Given a simulation time step $dt$, we’ll first calculate the discrete state matrices:

$\textbf{A}_d = \textrm{e}^{\textbf{A}dt},$

$\textbf{B}_d = \textbf{A}^{-1} (\textrm{e}^{\textbf{A}dt} - \textbf{I})\textbf{B}.$

Great! Easy. Now we can calculate the state matrices that will compensate for the discrete lowpass filter:

$\textbf{A}_d' = \frac{1}{1-a}(\textbf{A}_d + a\textbf{I}),$

$\textbf{B}_d' = \frac{1}{1-a}\textbf{B}_d,$

where $a = \textrm{e}^{-\frac{dt}{\tau}}$.

Alright! So that’s our system now, a basic point attractor implementation in Nengo 2.3 looks like this:

tau = 0.1  # synaptic time constant

# the A matrix for our point attractor
A = np.array([[0.0, 1.0],
[-alpha*beta, -alpha]])

# the B matrix for our point attractor
B = np.array([[0.0], [alpha*beta]])

# account for discrete lowpass filter
a = np.exp(-dt/tau)
if analog:
A = tau * A + np.eye(2)
B = tau * B
else:
# discretize
Bd = np.dot(np.linalg.inv(A), np.dot((Ad - np.eye(2)), B))
A = 1.0 / (1.0 - a) * (Ad - a * np.eye(2))
B = 1.0 / (1.0 - a) * Bd

net = nengo.Network(label='Point Attractor')
net.config[nengo.Connection].synapse = nengo.Lowpass(tau)

with config, net:
net.ydy = nengo.Ensemble(n_neurons=n_neurons, dimensions=2,
# set it up so neurons are tuned to one dimensions only
encoders=nengo.dists.Choice([[1, 0], [-1, 0], [0, 1], [0, -1]]))
# set up Ax part of point attractor
nengo.Connection(net.ydy, net.ydy, transform=A)

# hook up input
net.input = nengo.Node(size_in=1, size_out=1)
# set up Bu part of point attractor
nengo.Connection(net.input, net.ydy, transform=B)

# hook up output
net.output = nengo.Node(size_in=1, size_out=1)
nengo.Connection(net.ydy[0], net.output, synapse=None)


Note that for calculating Ad we’re using expm which is the matrix exp function from scipy.linalg package. The numpy.exp does an elementwise exp, which is definitely not what we want here, and you will get some confusing bugs if you’re not careful.

Code for implementing and also testing under some different gains is up on my GitHub, and generates the following plots for dt=0.001:

In the above results you can see that the when the gains are low, and thus the system dynamics are slower, that you can’t really tell a difference between the continuous and discrete filter compensation. But! As you get larger gains and faster dynamics, the resulting effects become much more visible.

If you’re building your own system, then I also recommend using the ss2sim function from Aaron’s nengolib library. It automatically handles compensation for any synapses and generates the matrices that account for discrete or continuous implementations automatically. Using the library looks like:

tau = 0.1  # synaptic time constant
synapse = nengo.Lowpass(tau)

# the A matrix for our point attractor
A = np.array([[0.0, 1.0],
[-alpha*beta, -alpha]])

# the B matrix for our point attractor
B = np.array([[0.0], [alpha*beta]])

from nengolib.synapses import ss2sim
C = np.eye(2)
D = np.zeros((2, 2))
linsys = ss2sim((A, B, C, D),
synapse=synapse,
dt=None if analog else dt)
A = linsys.A
B = linsys.B


Conclusions

So there you are! Go forward and model free of error introduced by improperly accounting for discrete simulation! If, like me, you’re doing anything with neural modelling and motor control (i.e. systems with very quickly evolving dynamics), then hopefully you’ve found all this work particularly interesting, as I did.

There’s a ton of extensions and different directions that this work can be and has already been taken, with a bunch of really neat systems developed using this more accurate accounting for synaptic filtering as a base. You can read up on this and applications to modelling time delays and time cells and lots lots more up on Aaron’s GitHub, and hisrecent papers, which are listed on his lab webpage.

## Quick calculations with SymPy and Cython

Alrighty! Last time I posted about SymPy we weren’t worrying too much about computation speed. It was worth the time saved by automating the Jacobian, inertia matrix, etc calculations and it didn’t affect simulation speed really all that much. But! When working with a real arm it suddenly becomes critical to have highly efficient calculations.

Turns out that I still don’t want to code those equations by hand. There are probably very nice options for generating optimized code using Mathematica or MapleSim or whatever pay software, but SymPy is free and already Python compatible so my goal is to stick with that.

Options

I did some internet scouring, and looked at installing various packages to help speed things up, including

1. trying out the Sympy simplify function,
2. trying out SymEngine,
3. trying out the Sympy compile to Theano function,
4. trying out the Sympy autowrap function, and
5. various combinations of the above.

The SymEngine backend and Theano functions really didn’t give any improvements for the kind of low-dimensional vector calculations performed for control here. Disclaimer, it could be the case that I gave up on these implementations too quickly, but from some basic testing it didn’t seem like they were the way to go for this kind of problem.

So down to the simplify function and compiling to the Cython backend options. First thing you’ll notice when using the simplify is that the generation time for the function can take orders of magnitude longer (up to 12ish hours for inertia matrices for the Kinova Jaco 2 arm with simplify vs about 1-2 minutes without simplify, for example). And then, as a nice kick in the teeth after that, there’s almost no difference between straight Cython of the non-simplified functions and the simplified functions. Here are some graphs:

So you can see how the addition of the simplify drastically increases the compile time in the top half of the graphs there. In the lower half, you can see that the simplify does save some execution time relative to the baseline straight lambdify function, but once you start compiling to Cython with the autowrap the difference is on the order of hundredths to thousandths of milliseconds.

Results

So! My recommendation is

• Don’t use simplify,
• do use autowrap with the Cython backend.

To compile using the Cython backend:

from sympy.utilities.autowrap import autowrap
function = autowrap(sympy_expression, backend="cython",
args=parameters)


In the last Sympy post I looked at a hard-coded calculation against the Sympy generated function, using the timeit function, which runs a given chunk of code 1,000,000 times and tells you how long it took. To save tracking down the results from that post, here are the timeit results of a Sympy function generated using just the lambdify function vs the hard-coding the function in straight Numpy:

And now using the autowrap function to compile to Cython code, compared to hard-coding the function in Numpy:

This is a pretty crazy improvement, and means that we can have the best of both worlds. We can declare our transforms in SymPy and automate the calculation of our Jacobians and inertia matrices, and still have the code execute fast enough that we can use it to control real robots.

That said, this is all from my basic experimenting with some different optimisations, which is a far from thorough exploration of the space. If you know of any other ways to speed up execution, please comment below!

You can find the code I used to generate the above plots up on my GitHub.

## Full body obstacle collision avoidance

Previously I’ve discussed how to avoid obstacles using DMPs in the end-effector trajectory. This is good when you’re controlling a single disconnected point-mass, like a mobile robot navigating around an environment. But if you want to use this system to control a robotic manipulator, then pretty quickly you run into the problem that your end-effector is not a disconnected point-mass moving around the environment. It’s attached to the rest of the arm, and moving such that the arm segments and joints also avoid the obstacle is a whole other deal.

I was doing a quick lit scan on methods for control methods for avoiding joint collision with obstacles, and was kind of surprised that there wasn’t really much in the realm of recent discussions about it. There is, however, a 1986 paper from Dr. Oussama Khatib titled Real-time obstacle avoidance for manipulators and mobile robots that pretty much solves this problem short of getting into explicit path planning methods. Which could be why there aren’t papers since then about it. All the same, I couldn’t find any implementations of it online, and there are a few tricky parts, so in this post I’m going to work through the implementation.

Note: The implementation that I’ve worked out here uses spheres to represent the obstacles. This works pretty well in general by just making the sphere large enough to cover whatever obstacle you’re avoiding. But if you want a more precise control around other shapes, you can check out the appendix of Dr. Khatib’s paper, where he works provides the math for cubes and cones as well.

Note: You can find all of the code I use here and everything you need for the VREP implementation up on my GitHub. I’ve precalculated the functions that are required, because generating them is quite computationally intensive. Hopefully the file saving doesn’t get weird between different operating systems, but if it does, try deleting all of the files in the ur5_config folder and running the code again. This will regenerate those files (on my laptop this takes ~4 hours, though, so beware).

The general idea

Since it’s from Dr. Khatib, you might expect that this approach involves task space. And indeed, your possible suspicions are correct! The algorithm is going to work by identifying forces in Cartesian coordinates that will move any point of the arm that’s too close to an obstacle away from it. The algorithm follows the form:

Setup

• Specify obstacle location and size
• Specify a threshold distance to the obstacle

While running

• Find the closest point of each arm segment to obstacles
• If within threshold of obstacle, generate force to apply to that point
• Transform this force into joint torques
• Add directly to the outgoing control signal

Something that you might notice about this is that it’s similar to the addition of the null space controller that we’ve seen before in operational space control. There’s a distinct difference here though, in that the control signal for avoiding obstacles is added directly to the outgoing control signal, and that it’s not filtered (like the null space control signal) such that there’s no guarantee that it won’t affect the movement of the end-effector. In fact, it’s very likely to affect the movement of the end-effector, but that’s also desirable, as not ramming the arm through an obstacle is as important as getting to the target.

OK, let’s walk through these steps one at a time.

Setup

I mentioned that we’re going to treat all of our obstacles as spheres. It’s actually not much harder to do these calculations for cubes too, but this is already going to be long enough so I’m only addressing sphere’s here. This algorithm assumes we have a list of every obstacle’s centre-point and radius.

We want the avoidance response of the system to be a function of the distance to the obstacle, such that the closer the arm is to the obstacle the stronger the response. The function that Dr. Khatib provides is of the following form:

$\textbf{F}_{psp} = \left\{ \begin{array}{cc}\eta (\frac{1.0}{\rho} - \frac{1}{\rho_0}) \frac{1}{\rho^2} \frac{\partial \rho}{\partial \textbf{x}} & \rho \leq \rho_0 \\ \textbf{0} & \rho > \rho_0 \end{array} \right. ,$

where $\rho$ is the distance to the target, $\rho_0$ is the threshold distance to the target at which point the avoidance function activates, $\frac{\partial \rho}{\partial \textbf{x}}$ is the partial derivative of the distance to the target with respect to a given point on the arm, and $\eta$ is a gain term.

This function looks complicated, but it’s actually pretty intuitive. The partial derivative term in the function works simply to point in the opposite direction of the obstacle, in Cartesian coordinates, i.e. tells the system how to get away from the obstacle. The rest of the term is just a gain that starts out at zero when $\rho = \rho_0$, and gets huge as the obstacle nears the object (as $\rho \to 0 \Rightarrow \frac{1}{\rho} \to \infty$). Using $\eta = .2$ and $\rho_0 = .2$ gives us a function that looks like

So you can see that very quickly a very, very strong push away from this obstacle is going to be generated once we enter the threshold distance. But how do we know exactly when we’ve entered the threshold distance?

Find the closest point

We want to avoid the obstacle with our whole body, but it turns out we can reduce the problem to only worrying about the closest point of each arm segment to the obstacle, and move that one point away from the obstacle if threshold distance is hit.

To find the closest point on a given arm segment to the obstacle we’ll do some pretty neat trig. I’ll post the code for it and then discuss it below. In this snippet, p1 and p2 are the beginning and ending $(x,y,z)$ locations of arm segment (which we are assuming is a straight line), and v is the center of the obstacle.

# the vector of our line
vec_line = p2 - p1
# the vector from the obstacle to the first line point
vec_ob_line = v - p1
# calculate the projection normalized by length of arm segment
projection = (np.dot(vec_ob_line, vec_line) /
np.dot(vec_line, vec_line))
if projection < 0:
# then closest point is the start of the segment
closest = p1
elif projection > 1:
# then closest point is the end of the segment
closest = p2
else:
closest = p1 + projection * vec_line


The first thing we do is find the arm segment line, and then line from the obstacle center to the start point of the arm segment. Once we have these, we do:

$\frac{\textbf{v}_\textrm{ob\_line} \; \cdot \; \textbf{v}_\textrm{line}}{\textbf{v}_\textrm{line} \; \cdot \; \textbf{v}_\textrm{line}},$

using the geometric definition of the dot product two vectors, we can rewrite the above as

$\frac{||\textbf{v}_\textrm{ob\_line}|| \; || \textbf{v}_\textrm{line} || \; \textrm{cos}(\theta)}{||\textbf{v}_\textrm{line}||^2} = \frac{||\textbf{v}_\textrm{ob\_line}||} {||\textbf{v}_\textrm{line}||} \textrm{cos}(\theta)$

which reads as the magnitude of vec_ob_line divided by the magnitude of vec_line (I know, these are terrible names, sorry) multiplied by the angle between the two vectors. If the angle between the vectors is < 0 (projection will also be < 0), then right off the bat we know that the start of the arm segment, p1, is the closest point. If the projection value is > 1, then we know that 1) the length from the start of the arm segment to the obstacle is longer than the length of the arm, and 2) the angle is such that the end of the arm segment, p2, is the closest point to the obstacle.

Finally, in the last case we know that the closest point is somewhere along the arm segment. To find where exactly, we do the following

$\textbf{p}_1 + \textrm{projection} \; \textbf{v}_\textrm{line},$

which can be rewritten

$\textbf{p}_1 + \frac{||\textbf{v}_\textrm{ob\_line}||} {||\textbf{v}_\textrm{line}||} \textrm{cos}(\theta) \; \textbf{v}_\textrm{line},$

I find it more intuitive if we rearrange the second term to be

$\textbf{p}_1 + \frac{\textbf{v}_\textrm{line}} {||\textbf{v}_\textrm{line}||} \; ||\textbf{v}_\textrm{ob\_line} || \; \textrm{cos}(\theta).$

So then what this is all doing is starting us off at the beginning of the arm segment, p1, and to that we add this other fun term. The first part of this fun term provides direction normalized to magnitude 1. The second part of this term provides magnitude, specifically the exact distance along vec_line we need to traverse to form reach a right angle (giving us the shortest distance) pointing at the obstacle. This handy figure from the Wikipedia page helps illustrate exactly what’s going on with the second part, where B is be vec_line and A is vec_ob_line:

Armed with this information, we understand how to find the closest point of each arm segment to the obstacle, and we are now ready to generate a force to move the arm in the opposite direction!

Check distance, generate force

To calculate the distance, all we have to do is calculate the Euclidean distance from the closest point of the arm segment to the center of the sphere, and then subtract out the radius of the sphere:

# calculate distance from obstacle vertex to the closest point
dist = np.sqrt(np.sum((v - closest)**2))
# account for size of obstacle
rho = dist - obstacle[3]


Once we have this, we can check it and generate $F_{psp}$ using the equation we defined above. The one part of that equation that wasn’t specified was exactly what $\frac{\partial \rho}{\partial \textbf{x}}$ was. Since it’s just the partial derivative of the distance to the target with respect to the closest point, we can calculate it as the normalized difference between the two points:

drhodx = (v - closest) / rho


Alright! Now we’ve found the closest point, and know the force we want to apply, from here it’s standard operational space procedure.

Transform the force into torques

As we all remember, the equation for transforming a control signal from operational space to involves two terms aside from the desired force. Namely, the Jacobian and the operational space inertia matrix:

$\textbf{u}_\textrm{psp} = \textbf{J}^T_{psp} \textbf{M}_{psp} \textbf{F}_{psp},$

where $\textbf{J}_{psp}$ is the Jacobian for the point of interest, $\textbf{M}_{psp}$ is the operational space inertia matrix for the point of interest, and $\textbf{F}_{psp}$ is the force we defined above.

Calculating the Jacobian for an unspecified point

So the first thing we need to calculate is the Jacobian for this point on the arm. There are a bunch of ways you could go about this, but the way I’m going to do it here is by building on the post where I used SymPy to automate the Jacobian derivation. The way we did that was by defining the transforms from the origin reference frame to the first link, from the first link to the second, etc, until we reached the end-effector. Then, whenever we needed a Jacobian we could string together the transforms to get the transform from the origin to that point on the arm, and take the partial derivative with respect to the joints (using SymPy’s derivative method).

As an example, say we wanted to calculate the Jacobian for the third joint, we would first calculate:

$^3_{\textrm{org}}\textbf{T} = ^0_{\textrm{org}}\textbf{T} \; ^1_0\textbf{T} \; ^2_1\textbf{T} \; ^3_2\textbf{T},$

where $^m_n\textbf{T}$ reads the transform from reference frame $m$ to reference frame $n$.

Once we have this transformation matrix, $^3_\textrm{org}\textbf{T}$, we multiply it by the point of interest in reference frame 3, which, previously, has usually been $\textbf{x} = [0, 0, 0]$. In other words, usually we’re just interested in the origin of reference frame 3. So the Jacobian is just

$\frac{\partial \; ^3_\textrm{org}\textbf{T} \textbf{x}}{\partial \textbf{q}}.$

what if we’re interested in some non-specified point along link 3, though? Well, using SymPy we set make $\textbf{x} = [x_0, x_1, x_2, 1]$ instead of $\textbf{x} = [0, 0, 0, 1]$ (recall the 1 at the end in these vectors is just to make the math work), and make the Jacobian function SymPy generates for us dependent on both $\textbf{q}$ and $\textbf{x}$, rather than just $\textbf{q}$. In code this looks like:

Torg3 = self._calc_T(name="3")
# transform x into world coordinates
Torg3x = sp.simplify(Torg3 * sp.Matrix(x))
J3_func = sp.lambdify(q + x, Torg3)


Now it’s possible to calculate the Jacobian for any point along link 3 just by changing the parameters that we pass into J3_func! Most excellent.

We are getting closer.

NOTE: This parameterization can significantly increase the build time of the function, it took my laptop about 4 hours. To decrease build time you can try commenting out the simplify calls from the code, which might slow down run-time a bit but significantly drops the generation time.

Where is the closest point in that link’s reference frame?

A sneaky problem comes up when calculating the closest point of each arm segment to the object: We’ve calculated the closest point of each arm segment in the origin’s frame of reference, and we need thew relative to each link’s own frame of reference. Fortunately, all we have to do is calculate the inverse transform for the link of interest. For example, the inverse transform of $^3_\textrm{org}\textbf{T}$ transforms a point from the origin’s frame of reference to the reference frame of the 3rd joint.

I go over how to calculate the inverse transform at the end of my post on forward transformation matrices, but to save you from having to go back and look through that, here’s the code to do it:

Torg3 = self._calc_T(name="3")
rotation_inv = Torg3[:3, :3].T
translation_inv = -rotation_inv * Torg3[:3, 3]
Torg3_inv = rotation_inv.row_join(translation_inv).col_join(
sp.Matrix([[0, 0, 0, 1]]))


And now to find the closest point in the coordinates of reference frame 3 we simply

x = np.dot(Torg3_inv, closest)


This x value is what we’re going to plug in as parameters to our J3_func above to find the Jacobian for the closest point on link 3.

Calculate the operational space inertia matrix for the closest point

OK. With the Jacobian for the point of interest we are now able to calculate the operational space inertia matrix. This code I’ve explicitly worked through before, and I’ll show it in the full code below, so I won’t go over it again here.

The whole implementation

You can run an example of all this code controlling the UR5 arm to avoid obstacles in VREP using this code up on my GitHub. The specific code added to implement obstacle avoidance looks like this:

# find the closest point of each link to the obstacle
for ii in range(robot_config.num_joints):
# get the start and end-points of the arm segment
p1 = robot_config.Tx('joint%i' % ii, q=q)
if ii == robot_config.num_joints - 1:
p2 = robot_config.Tx('EE', q=q)
else:
p2 = robot_config.Tx('joint%i' % (ii + 1), q=q)

# calculate minimum distance from arm segment to obstacle
# the vector of our line
vec_line = p2 - p1
# the vector from the obstacle to the first line point
vec_ob_line = v - p1
# calculate the projection normalized by length of arm segment
projection = (np.dot(vec_ob_line, vec_line) /
np.sum((vec_line)**2))
if projection < 0:
# then closest point is the start of the segment
closest = p1
elif projection > 1:
# then closest point is the end of the segment
closest = p2
else:
closest = p1 + projection * vec_line
# calculate distance from obstacle vertex to the closest point
dist = np.sqrt(np.sum((v - closest)**2))
# account for size of obstacle

if rho < threshold:
eta = .02
drhodx = (v - closest) / rho
Fpsp = (eta * (1.0/rho - 1.0/threshold) *
1.0/rho**2 * drhodx)

# get offset of closest point from link's reference frame
T_inv = robot_config.T_inv('link%i' % ii, q=q)
m = np.dot(T_inv, np.hstack([closest, [1]]))[:-1]
# calculate the Jacobian for this point
Jpsp = robot_config.J('link%i' % ii, x=m, q=q)[:3]

# calculate the inertia matrix for the
# point subjected to the potential space
Mxpsp_inv = np.dot(Jpsp,
np.dot(np.linalg.pinv(Mq), Jpsp.T))
svd_u, svd_s, svd_v = np.linalg.svd(Mxpsp_inv)
# cut off singular values that could cause problems
singularity_thresh = .00025
for ii in range(len(svd_s)):
svd_s[ii] = 0 if svd_s[ii] < singularity_thresh else \
1./float(svd_s[ii])
# numpy returns U,S,V.T, so have to transpose both here
Mxpsp = np.dot(svd_v.T, np.dot(np.diag(svd_s), svd_u.T))

u_psp = -np.dot(Jpsp.T, np.dot(Mxpsp, Fpsp))
if rho < .01:
u = u_psp
else:
u += u_psp


The one thing in this code I didn’t talk about is that you can see that if rho < .01 then I set u = u_psp instead of just adding u_psp to u. What this does is basically add in a fail safe take over of the robotic control saying that “if we are about to hit the obstacle forget about everything else and get out of the way!”.

Results

And that’s it! I really enjoy how this looks when it’s running, it’s a really effective algorithm. Let’s look at some samples of it in action.

First, in a 2D environment, where it’s real easy to move around the obstacle and see how it changes in response to the new obstacle position. The red circle is the target and the blue circle is the obstacle:

And in 3D in VREP, running the code example that I’ve put up on my GitHub implementing this. The example of it running without obstacle avoidance code is on the left, and running with obstacle avoidance is on the right. It’s kind of hard to see but on the left the robot moves through the far side of the obstacle (the gold sphere) on its way to the target (the red sphere):

And one last example, the arm dodging a moving obstacle on its way to the target.

The implementation is a ton of fun to play around with. It’s a really nifty algorithm, that works quite well, and I haven’t found many alternatives in papers that don’t go into path planning (if you know of some and can share that’d be great!). This post was a bit of a journey, but hopefully you found it useful! I continue to find it impressive how many different neat features like this can come about once you have the operational space control framework in place.

## Velocity limiting in operational space control

Recently, I was reading through an older paper on effective operational space control, talking specifically point to point control in operational space. The paper mentioned that even if you have a perfect model of the system, you’re going to run into trouble if you use just a basic PD formula to define your control signal in operational space:

$u_x = k_p (\textbf{x}^* - \textbf{x}) - k_v \dot{\textbf{x}},$

where $\textbf{x}$ and $\dot{\textbf{x}}$ are the system position and velocity in operational space, $\textbf{x}^*$ is the target position, and $k_p$ and $k_v$ are gains.

If you define your operational space control signal like this, and then translate this signal into joint torques (using, for example, methods discussed in other posts), you’re going to see a very non-straight trajectory emerge in larger movements as a result of “actuator saturation, and bandwidth and velocity limitations”. In the example of a standard robot, you might run into issues with your motors not being able to actually generate the torques that have been specified, the frequency of control and feedback might not be sufficient, and you could hit hard constraints on system velocity. The solution to this problem was presented in this 1987 paper by Dr. Oussama Khatib, and is pretty slick and very useful, so I thought I’d write it up here for any other unfortunate souls wandering around in ignorance. First though, here’s what it looks like to move large point to point distances without velocity limiting:

As you can see, the system isn’t moving in a straight line, which can be very aggravating if you’ve worked and reworked out the equations and double checked all your parameters, etc etc. A few things, first, when working with simulations it’s easy to forget how ridiculously fast this actually would be in real-time. Even though it takes a minute to simulate the above movement, in real-time, is happening over the course of 200ms. Taking that into account, this is pretty good. Also, there’s an obvious solution here, slow down your movement. The source of this problem is largely that all of the motors are not able to apply the torques specified and move at the required speed. Some of the motors have a lot less mass to throw around and will be able to move at the specified torques, but not all. Hence the not straight trajectory.

You can of course drop the gains on your PD signal, but that’s not really a great methodical solution. So, what can we do?

Well, if we rearrange the PD control signal specified above into

$u_x = k_v (\dot{\textbf{x}}^* - \dot{\textbf{x}}),$

where $\dot{\textbf{x}}^*$ is the desired velocity, we see that this signal can be interpreted as a pure velocity servo-control signal, with velocity gain $k_v$ and a desired velocity

$\dot{\textbf{x}}^* = \frac{k_p}{k_v}(\textbf{x}^* - \textbf{x})$.

When things are in this form, it becomes a bit more clear what we have to do: limit the desired velocity to be at most some specified maximum velocity of the end-effector, $V_\textrm{max}$. This value should be low enough that the transformation into joint torques doesn’t result in anything larger than the actuators can generate.

Taking $V_\textrm{max}$, what we want is to clip the magnitude of the control signal to be $V_\textrm{max}$ if it’s ever larger (in positive or negative directions), and to be equal to $\frac{kp}{kv}(\textbf{x}^* - \textbf{x})$ otherwise. The math for this works out such that we can accomplish this with a control signal of the form:

$\textbf{u}_\textbf{x} = -k_v (\dot{\textbf{x}} + \textrm{sat}\left(\frac{V_\textrm{max}}{\lambda |\tilde{\textbf{x}}|} \right) \lambda \tilde{\textbf{x}})$,

where $\lambda = \frac{k_p}{k_v}$ , $\tilde{\textbf{x}} = \textbf{x} - \textbf{x}^*$, and $\textrm{sat}$ is the saturation function, such that

$\textrm{sat}(y) = \left\{ \begin{array}{cc} |y| \leq 1 & \Rightarrow y \\ |y| > 1 & \Rightarrow 1 \end{array} \right.$

where $|y|$ is the absolute value of $y$, and is applied element wise to the vector $\tilde{\textbf{x}}$ in the control signal.

As a result of using this saturation function, the control signal behaves differently depending on whether or not $\dot{\textbf{x}}^* > V_\textrm{max}$:

$\textbf{u}_\textbf{x} = \left\{ \begin{array}{cc} \dot{\textbf{x}}^* \geq V_\textrm{max} & \Rightarrow -k_v (\dot{\textbf{x}} + V_\textbf{max} \textrm{sgn}(\tilde{\textbf{x}})) \\ \dot{\textbf{x}}^* < V_\textrm{max} & \Rightarrow -k_v \dot{\textbf{x}} + k_p \tilde{\textbf{x}} \end{array} \right.$

where $\textrm{sgn}(y)$ is a function that returns -1 if $y < 0$ and 1 if $y \geq 0$, and is again applied element-wise to vectors. Note that the control signal in the second condition is equivalent to our original PD control signal $k_p(\textbf{x}^* - \textbf{x}) - k_v \dot{\textbf{x}}$. If you’re wondering about negative signs make sure you note that $\tilde{\textbf{x}} = \textbf{x} - \textbf{x}^*$ and not $\textbf{x}^* - \textbf{x}$, as you might assume.

So now the control signal is behaving exactly as desired! Moves the system towards the target, but no faster than the specified maximum velocity. Now our trajectories look like this:

So those are starting to look a lot better! The first thing you’ll notice is that this is a fair bit slower of a movement. Well, actually, it’s waaaayyyy slower because the playback speed here is 4x faster than in that first animation, and this is a movement over 2s. Which has pros and cons, con: it’s slower, pro: it’s straighter, and you’re less likely to be murdered by it. When you move from simulations to real systems that latter point really moves way up the priority list.

Second thing to notice, the system seems to be minimising the error along one dimension, and then along the next, and then arrives at the target. What’s going on?  Because the error along each of the $(x,y,z)$ dimensions isn’t the same, when speed gets clipped along one of the dimensions you’re no longer going to be moving in a straight line directly to the target. To address this, we’re going to add a scaling term whenever clipping happens, such that you reduce the speed you move along each dimension by the same ratio, so that you’re still moving in a straight line.

It’s a liiiiittle bit more complicated than that, but not much. First, we’ll calculate the values being passed in to the saturation function for each $(x,y,z)$ dimension. We’ll then check to see if any of them are going to get clipped, and if there’s more than one that saturates we’ll find the one that is affected the most. After we’ve identified which dimension it is, we go through and calculate what the control signal would have been without velocity limiting, and what it will be now with velocity limiting. This scaling term tells us how much the control signal was reduced, and we can then use it to reduce the control signals of the other dimensions by the same amount. These other dimensions might still saturate, though, so we have to recalculate the saturation function for them once they’ve been scaled. Here’s what this all looks like in code:

# implement velocity limiting
lamb = kp / kv
x_tilde = xyz - target_xyz
sat = vmax / (lamb * np.abs(x_tilde))
scale = np.ones(3)
if np.any(sat < 1):
index = np.argmin(sat)
unclipped = kp * x_tilde[index]
clipped = kv * vmax * np.sign(x_tilde[index])
scale = np.ones(3) * clipped / unclipped
scale[index] = 1
u_xyz = -kv * (dx + np.clip(sat / scale, 0, 1) *
scale * lamb * x_tilde)


And now, finally, we start getting the trajectories that we’ve been wanting the whole time:

And finally we can rest easy, knowing that our robot is moving at a reasonable speed along a direct path to its goals. Wherever you’d like to use this neato ‘ish you should be able to just paste in the above code, define your vmax, kp, and kv values and be good to go!

## Using SymPy’s lambdify for generating transform matrices and Jacobians

I’ve been working in VREP with some of their different robot models and testing out force control, and one of the things that becomes pretty important for efficient workflow is to have a streamlined method for setting up the transform matrices and calculating the Jacobians for different robots. You do not want to be working these all out by hand and then writing them in yourself.

A solution that’s been working well for me (and is fully implemented in Python) is to use SymPy to set up the basic transform matrices, from each joint to the next, and then use it’s derivative function to calculate the Jacobian. Once this is calculated you can then use SymPy’s lambdify function to parameterize this, and off you go! In this post I’m going to work through an example for controlling VREP’s UR5 arm using force control. And if you’re just looking for code examples, you can find it all up on my GitHub.

Edit: Updated the code to be much nicer, added saving of calculated functions, and a null space controller.
Edit 2: Removed the simplify calls from the code, not worth the generation time increase (if execution is a great concern can generate the functions using cython).

Setting up the transform matrices

This is the part of this process that is unique to each arm. The goal is to set up a system so that you can specify your transforms for each joint (and to each centre-of-mass (COM) too, of course) and then be on your way. So here’s the UR5, cute thing that it is:

For the UR5, there are 6 joints, and we’re going to be specifying 9 transform matrices: 6 joints and the COM for three arm segments (the remaining arm segment COMs are centered at their respective joint’s reference frame). The joints are all rotary, with 0 and 4 rotating around the z-axis, and the rest all rotating around x.

So first, we’re going to create a class called robot_config. Then, to create our transform matrices in SymPy first we need to set up the variables we’re going to be using:

# set up our joint angle symbols (6th angle doesn't affect any kinematics)
self.q = [sp.Symbol('q%i'%ii) for ii in range(6)]
# segment lengths associated with each joint
self.L = np.array([0.0935, .13453, .4251, .12, .3921, .0935, .0935, .0935])


where self.q is an array storing all our joint angle symbols, and self.L is an array of all of the offsets for each joint and arm segment lengths.

Using these to create the transform matrices for the joints, we get a set up that looks like this:

# transform matrix from origin to joint 0 reference frame
self.T0org = sp.Matrix([[sp.cos(self.q[0]), -sp.sin(self.q[0]), 0, 0],
[sp.sin(self.q[0]), sp.cos(self.q[0]), 0, 0],
[0, 0, 1, self.L[0]],
[0, 0, 0, 1]])

# transform matrix from joint 0 to joint 1 reference frame
self.T10 = sp.Matrix([[1, 0, 0, -L[1]],
[0, sp.cos(-self.q[1] + sp.pi/2),
-sp.sin(-self.q[1] + sp.pi/2), 0],
[0, sp.sin(-self.q[1] + sp.pi/2),
sp.cos(-self.q[1] + sp.pi/2), 0],
[0, 0, 0, 1]])

# transform matrix from joint 1 to joint 2 reference frame
self.T21 = sp.Matrix([[1, 0, 0, 0],
[0, sp.cos(-self.q[2]),
-sp.sin(-self.q[2]), self.L[2]],
[0, sp.sin(-self.q[2]),
sp.cos(-self.q[2]), 0],
[0, 0, 0, 1]])

# transform matrix from joint 2 to joint 3
self.T32 = sp.Matrix([[1, 0, 0, L[3]],
[0, sp.cos(-self.q[3] - sp.pi/2),
-sp.sin(-self.q[3] - sp.pi/2), self.L[4]],
[0, sp.sin(-self.q[3] - sp.pi/2),
sp.cos(-self.q[3] - sp.pi/2), 0],
[0, 0, 0, 1]])

# transform matrix from joint 3 to joint 4
self.T43 = sp.Matrix([[sp.sin(-self.q[4] - sp.pi/2),
sp.cos(-self.q[4] - sp.pi/2), 0, -self.L[5]],
[sp.cos(-self.q[4] - sp.pi/2),
-sp.sin(-self.q[4] - sp.pi/2), 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]])

# transform matrix from joint 4 to joint 5
self.T54 = sp.Matrix([[1, 0, 0, 0],
[0, sp.cos(self.q[5]), -sp.sin(self.q[5]), 0],
[0, sp.sin(self.q[5]), sp.cos(self.q[5]), self.L[6]],
[0, 0, 0, 1]])

# transform matrix from joint 5 to end-effector
self.TEE5 = sp.Matrix([[1, 0, 0, self.L[7]],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]])


You can see a bunch of offsetting of the joint angles by -sp.pi/2 and this is to account for the expected 0 angle (straight along the reference frame’s x-axis) at those joints being different than the 0 angle defined in the VREP simulation (at a 90 degrees from the x-axis). You can find these by either looking at and finding the joints’ 0 position in VREP or by trial-and-error empirical analysis.

Once you have your transforms, then you have to specify how to move from the origin to each reference frame of interest (the joints and COMs). For that, I’ve just set up a simple function with a switch statement:

# point of interest in the reference frame (right at the origin)
self.x = sp.Matrix([0,0,0,1])

def _calc_T(self, name, lambdify=True):
""" Uses Sympy to generate the transform for a joint or link

name string: name of the joint or link, or end-effector
lambdify boolean: if True returns a function to calculate
the transform. If False returns the Sympy
matrix
"""

# check to see if we have our transformation saved in file
if os.path.isfile('%s/%s.T' % (self.config_folder, name)):
Tx = cloudpickle.load(open('%s/%s.T' % (self.config_folder, name),
'rb'))
else:
if name == 'joint0' or name == 'link0':
T = self.T0org
elif name == 'joint1' or name == 'link1':
T = self.T0org * self.T10
elif name == 'joint2':
T = self.T0org * self.T10 * self.T21
T = self.T0org * self.T10 * self.Tl21
elif name == 'joint3':
T = self.T0org * self.T10 * self.T21 * self.T32
T = self.T0org * self.T10 * self.T21 * self.Tl32
elif name == 'joint4' or name == 'link4':
T = self.T0org * self.T10 * self.T21 * self.T32 * self.T43
elif name == 'joint5' or name == 'link5':
T = self.T0org * self.T10 * self.T21 * self.T32 * self.T43 * \
self.T54
elif name == 'link6' or name == 'EE':
T = self.T0org * self.T10 * self.T21 * self.T32 * self.T43 * \
self.T54 * self.TEE5
Tx = T * self.x  # to convert from transform matrix to (x,y,z)

# save to file
cloudpickle.dump(Tx, open('%s/%s.T' % (self.config_folder, name),
'wb'))

if lambdify is False:
return Tx
return sp.lambdify(self.q, Tx)


So the first part is pretty straight forward, create the transform matrix, and then at the end to get the (x,y,z) position we just multiply by a vector we created that represents a point at the origin of the last reference frame. Some of the transform matrices (the ones to the arm segments) I didn’t specify above just to cut down on space.

The second part is where we use this awesome lambify function, which lets us turn the matrix we’ve defined into a function, so that we can pass in joint angles and get back out the resulting (x,y,z) position. There’s also the option to get the straight up SymPy matrix return, in case you need the symbolic form (which we will!).

NOTE: You can also see that there’s a check built in to look for saved files, and to just load those saved files instead of recalculating things if they’re available. This is because calculating some of these matrices and their derivatives takes a long, long time. I used the cloudpickle module to do this because it’s able to easily handle saving a whole bunch of weird things that makes normal pickle sour.

Calculating the Jacobian

So now that we’re able to quickly generate the transform matrix for each point of interest on the UR5, we simply take the derivative of the equation for each (x,y,z) coordinate with respect to each joint angle to generate our Jacobian.

def _calc_J(self, name, lambdify=True):
""" Uses Sympy to generate the Jacobian for a joint or link

name string: name of the joint or link, or end-effector
lambdify boolean: if True returns a function to calculate
the Jacobian. If False returns the Sympy
matrix
"""

# check to see if we have our Jacobian saved in file
if os.path.isfile('%s/%s.J' % (self.config_folder, name)):
(self.config_folder, name), 'rb'))
else:
Tx = self._calc_T(name, lambdify=False)
J = []
# calculate derivative of (x,y,z) wrt to each joint
for ii in range(self.num_joints):
J.append([])
J[ii].append(Tx[0].diff(self.q[ii]))  # dx/dq[ii]
J[ii].append(Tx[1].diff(self.q[ii]))  # dy/dq[ii]
J[ii].append(Tx[2].diff(self.q[ii]))  # dz/dq[ii]


Here we retrieve the Tx vector from our _calc_T function, and then calculate the derivatives. When calculating the Jacobian for the end-effector, this is all we need! Huzzah!

But to calculate the Jacobian for transforming the inertia matrices of each arm segment into joint space we’re going to need the orientation information added to our Jacobian as well. This we know ahead of time, for each joint it’s a 3D vector with a 1 on the axis being rotated around. So we can predefine this:

# orientation part of the Jacobian (compensating for orientations)
self.J_orientation = [
[0, 0, 1], # joint 0 rotates around z axis
[1, 0, 0], # joint 1 rotates around x axis
[1, 0, 0], # joint 2 rotates around x axis
[1, 0, 0], # joint 3 rotates around x axis
[0, 0, 1], # joint 4 rotates around z axis
[1, 0, 0]] # joint 5 rotates around x axis


And then we just fill in the Jacobians for each reference frame with self.J_orientation up to the last joint, and fill in the rest of the Jacobian with zeros. So e.g. when we’re calculating the Jacobian for the arm segment just past the second joint we’ll use the first two rows of self.J_orientation and the rest of the rows will be 0.

So this leads us to the second half of the _calc_J function:

        end_point = name.strip('link').strip('joint')
if end_point != 'EE':
end_point = min(int(end_point) + 1, self.num_joints)
# add on the orientation information up to the last joint
for ii in range(end_point):
J[ii] = J[ii] + self.J_orientation[ii]
# fill in the rest of the joints orientation info with 0
for ii in range(end_point, self.num_joints):
J[ii] = J[ii] + [0, 0, 0]

# save to file
cloudpickle.dump(J, open('%s/%s.J' %
(self.config_folder, name), 'wb'))

J = sp.Matrix(J).T  # correct the orientation of J
if lambdify is False:
return J
return sp.lambdify(self.q, J)


The orientation information is added in, we save the result to file, and a function that takes in the joint angles and outputs the Jacobian is created (unless lambdify == False in which case the SymPy symbolic form is returned.)

Then finally, two wrapper functions are added in to make creating / accessing these functions easier. First, define a couple of dictionaries

# create function dictionaries
self._T = {}  # for transform calculations
self._J = {}  # for Jacobian calculations


and then our wrapper functions look like this

def T(self, name, q):
""" Calculates the transform for a joint or link
name string: name of the joint or link, or end-effector
q np.array: joint angles
"""
# check for function in dictionary
if self._T.get(name, None) is None:
print('Generating transform function for %s'%name)
self._T[name] = self.calc_T(name)
return self._T[name](*q)[:-1].flatten()

def J(self, name, q):
""" Calculates the transform for a joint or link
name string: name of the joint or link, or end-effector
q np.array: joint angles
"""
# check for function in dictionary
if self._J.get(name, None) is None:
print('Generating Jacobian function for %s'%name)
self._J[name] = self.calc_J(name)
return np.array(self._J[name](*q)).T


So how you use this class (all of this is in a class) is to call these T and J functions with the current joint angles. They’ll check to see if the functions have already be created or stored in file, if they haven’t then the T and / or J functions will be created, then our wrappers do a bit of formatting to get them into the proper form (i.e. transposing or cropping), and return you your (x,y,z) or Jacobian!

NOTE: It’s a bit of a misnomer to have the function be called T and actually return to you Tx, but hey this is a blog. Lay off.

Calculating the inertia matrix in joint-space and force of gravity
Now, since we’re here we might as well also calculate the functions for our inertia matrix in joint space and the effect of gravity. So, define a couple more placeholders in our robot_config class to help us:

self._M = []  # placeholder for (x,y,z) inertia matrices
self._Mq = None  # placeholder for joint space inertia matrix function
self._Mq_g = None  # placeholder for joint space gravity term function


and then add in our inertia matrix information (defined in each link’s centre-of-mass (COM) reference frame)

# create the inertia matrices for each link of the ur5
self._M.append(np.diag([1.0, 1.0, 1.0,
self._M.append(np.diag([2.5, 2.5, 2.5,
self._M.append(np.diag([5.7, 5.7, 5.7,
self._M.append(np.diag([3.9, 3.9, 3.9,
self._M.append(np.diag([0.7, 0.7, 0.7,


and then using our equations for calculating the system’s inertia and gravity we create our _calc_Mq and _calc_Mq_g functions

def _calc_Mq(self, lambdify=True):
""" Uses Sympy to generate the inertia matrix in
joint space for the ur5

lambdify boolean: if True returns a function to calculate
the Jacobian. If False returns the Sympy
matrix
"""

# check to see if we have our inertia matrix saved in file
if os.path.isfile('%s/Mq' % self.config_folder):
Mq = cloudpickle.load(open('%s/Mq' % self.config_folder, 'rb'))
else:
# get the Jacobians for each link's COM
J = [self._calc_J('link%s' % ii, lambdify=False)

# transform each inertia matrix into joint space
# sum together the effects of arm segments' inertia on each motor
Mq = sp.zeros(self.num_joints)
Mq += J[ii].T * self._M[ii] * J[ii]

# save to file
cloudpickle.dump(Mq, open('%s/Mq' % self.config_folder, 'wb'))

if lambdify is False:
return Mq
return sp.lambdify(self.q, Mq)

def _calc_Mq_g(self, lambdify=True):
""" Uses Sympy to generate the force of gravity in
joint space for the ur5

lambdify boolean: if True returns a function to calculate
the Jacobian. If False returns the Sympy
matrix
"""

# check to see if we have our gravity term saved in file
if os.path.isfile('%s/Mq_g' % self.config_folder):
'rb'))
else:
# get the Jacobians for each link's COM
J = [self._calc_J('link%s' % ii, lambdify=False)

# transform each inertia matrix into joint space and
# sum together the effects of arm segments' inertia on each motor
Mq_g = sp.zeros(self.num_joints, 1)
for ii in range(self.num_joints):
Mq_g += J[ii].T * self._M[ii] * self.gravity

# save to file
cloudpickle.dump(Mq_g, open('%s/Mq_g' % self.config_folder,
'wb'))

if lambdify is False:
return Mq_g
return sp.lambdify(self.q, Mq_g)


and wrapper functions

def Mq(self, q):
""" Calculates the joint space inertia matrix for the ur5

q np.array: joint angles
"""
# check for function in dictionary
if self._Mq is None:
print('Generating inertia matrix function')
self._Mq = self._calc_Mq()
return np.array(self._Mq(*q))

def Mq_g(self, q):
""" Calculates the force of gravity in joint space for the ur5

q np.array: joint angles
"""
# check for function in dictionary
if self._Mq_g is None:
print('Generating gravity effects function')
self._Mq_g = self._calc_Mq_g()
return np.array(self._Mq_g(*q)).flatten()


and we’re all set!

Putting it all together

Now we have nice clean code to generate everything we need for our controller. Using the controller developed in this post as a base, we can replace those calculations with the following nice compact code (which also includes a secondary null-space controller to keep the arm near resting joint angles):

# calculate position of the end-effector
# derived in the ur5 calc_TnJ class
xyz = robot_config.T('EE', q)
# calculate the Jacobian for the end effector
JEE = robot_config.J('EE', q)
# calculate the inertia matrix in joint space
Mq = robot_config.Mq(q)
# calculate the effect of gravity in joint space
Mq_g = robot_config.Mq_g(q)

# convert the mass compensation into end effector space
Mx_inv = np.dot(JEE, np.dot(np.linalg.inv(Mq), JEE.T))
svd_u, svd_s, svd_v = np.linalg.svd(Mx_inv)
# cut off any singular values that could cause control problems
singularity_thresh = .00025
for i in range(len(svd_s)):
svd_s[i] = 0 if svd_s[i] < singularity_thresh else \
1./float(svd_s[i])
# numpy returns U,S,V.T, so have to transpose both here
Mx = np.dot(svd_v.T, np.dot(np.diag(svd_s), svd_u.T))

kp = 100
kv = np.sqrt(kp)
# calculate desired force in (x,y,z) space
u_xyz = np.dot(Mx, target_xyz - xyz)
# transform into joint space, add vel and gravity compensation
u = (kp * np.dot(JEE.T, u_xyz) - np.dot(Mq, kv * dq) - Mq_g)

# calculate our secondary control signal
# calculated desired joint angle acceleration
q_des = (((robot_config.rest_angles - q) + np.pi) %
(np.pi*2) - np.pi)
u_null = np.dot(Mq, (kp * q_des - kv * dq))

# calculate the null space filter
Jdyn_inv = np.dot(Mx, np.dot(JEE, np.linalg.inv(Mq)))
null_filter = (np.eye(robot_config.num_joints) -
np.dot(JEE.T, Jdyn_inv))
u_null_filtered = np.dot(null_filter, u_null)

u += u_null_filtered


And there you go!

You can see all of this code up on my GitHub, along a full example controlling a UR5 VREP model though reaching to a series of targets. It looks something pretty much like this (slightly better because this gif was made before adding in the null space controller):

Overhead of using lambdify instead of hard-coding

This was a big question that I had, because when I’m running simulations the time step is on the order of a few milliseconds, with the controller code called at every time step. So I reaaaally can’t afford a crazy overhead for using lambdify.

To test this, I used the handy Python timeit, which requires a bit awkward setup, but quite nicely calls the function a whole bunch of times (1,000,000 by default) and accounts for various other things going on that could affect the execution time.

I tested two sample functions, one simpler than the other. Here’s the code for setting up and testing the first function:

import timeit
import seaborn

# Test 1 ----------------------------------------------------------------------
print('\nTest function 1: ')
time_sympy1 = timeit.timeit(
stmt = 'f(np.random.random(), np.random.random())',
setup = 'import numpy as np;\
import sympy as sp;\
q0 = sp.Symbol("q0");\
l0 = sp.Symbol("l0");\
a = sp.cos(q0) * l0;\
f = sp.lambdify((q0, l0), a, "numpy")')
print('Sympy lambdify function 1 time: ', time_sympy1)

time_hardcoded1 = timeit.timeit(
stmt = 'np.cos(np.random.random())*np.random.random()',
setup = 'import numpy as np')
print('Hard coded function 1 time: ', time_hardcoded1)


Pretty simple, a bit of a pain in the sympy setup, but other than that not bad at all. The second function I tested was just a random collection of cos and sin calls that resemble what gets computed in a Jacobian:

l1*np.sin(q0 - l0*np.sin(q1)*np.cos(q2) - l2*np.sin(q2) - l0*np.sin(q1) + q0*l0)*np.cos(q0) + l2*np.sin(q0)


And here’s the results:

So it’s slower for sure, but again this is the difference in time after 1,000,000 function calls, so until some big optimization needs to be done using the SymPy lambdify function is definitely worth the slight gain in execution time for the insane convenience.

The full code for the timing tests here are also up on my GitHub.