## Building a spiking neural model of adaptive arm control

About a year ago I published the work from my thesis in a paper called ‘A spiking neural model of adaptive arm control’. In this paper I presented the Recurrent Error-driven Adaptive Control Hierarchy (REACH) model. The goal of the model is to begin working towards reproducing behavioural level phenomena of human movement with biologically plausible spiking neural networks.

To do this, I start by using three methods from control literature (operational space control, dynamic movement primitives, and non-linear adaptive control) to create an algorithms level model of the motor control system that captures behavioural level phenomena of human movement. Then I explore how this functionality could be mapped to the primate brain and implemented in spiking neurons. Finally, I look at the data generated by this model on the behavioural level (e.g. kinematics of movement), the systems level (i.e. analysis of populations of neurons), and the single-cell level (e.g. correlating neural activity with movement parameters) and compare/contrast with experimental data.

By having a full model framework (from observable behaviour to neural spikes) is to have a more constrained computational model of the motor control system; adding lower-level biological constraints to behavioural models and higher-level behavioural constraints to neural models.

In general, once you have a model, the critical next step is to generating testable predictions that can be used to discriminate between other models with different implementations or underlying algorithms. Discriminative predictions allow us to design experiments that can gather evidence in favour or against different hypotheses of brain function, and provide clues to useful directions for further research. Which is the real contribution of computational modeling.

So that’s a quick overview of the paper; there are quite a few pages of supplementary information that describe the details of the model implementation, and I provided the code and data used to generate the data analysis figures. However, code to explicitly run the model on your own has been missing. As one of the major points of this blog is to provide code for furthering research, this is pretty embarrassing. So, to begin to remedy this, in this post I’m going to work through a REACH framework for building models to control a two-link arm through reaching in a line, tracing a circle, performing the centre-out reaching task, and adapting online to unexpected perturbations during reaching imposed by a joint-velocity based forcefield.

This post is directed towards those who have already read the paper (although not necessarily the supplementary material). To run the scripts you’ll need Nengo, Nengo GUI, and NengoLib all installed. There’s a description of the theory behind the Neural Engineering Framework, which I use extensively in my Nengo modeling, in the paper. I’m hoping that between that and code readability / my explanations below that most will be comfortable starting to play around with the code. But if you’re not, and would like more resources, you can check out the Getting Started page on the Nengo website, the tutorials from How To Build a Brain, and the examples in the Nengo GUI.

You can find all of the code up on my GitHub.

NOTE: I’m using the two-link arm (which is fully implemented in Python) instead of the three-link arm (which has compile issues for Macs) both so that everyone should be able to run the model arm code and to reduce the number of neurons that are required for control, so that hopefully you can run this on you laptop in the event that you don’t have a super power Ubuntu work station. Scaling this model up to the three-link arm is straight-forward though, and I will work on getting code up (for the three-link arm for non-Mac users) as a next project.

Implementing PMC – the trajectory generation system

I’ve talked at length about dynamic movement primitives (DMPs) in previous posts, so I won’t describe those again here. Instead I will focus on their implementation in neurons.

def generate(y_des, speed=1, alpha=1000.0):
beta = alpha / 4.0

# generate the forcing function
forces, _, goals = forcing_functions.generate(
y_des=y_des, rhythmic=False, alpha=alpha, beta=beta)

# create alpha synapse, which has point attractor dynamics
tau = np.sqrt(1.0 / (alpha*beta))
alpha_synapse = nengolib.Alpha(tau)

net = nengo.Network('PMC')
with net:
net.output = nengo.Node(size_in=2)

# create a start / stop movement signal
time_func = lambda t: min(max(
(t * speed) % 4.5 - 2.5, -1), 1)

def goal_func(t):
t = time_func(t)
if t <= -1:
return goals[0]
return goals[1]
net.goal = nengo.Node(output=goal_func, label='goal')

# -------------------- Ramp ---------------------------
ramp_node = nengo.Node(output=time_func, label='ramp')
net.ramp = nengo.Ensemble(
n_neurons=500, dimensions=1, label='ramp ens')
nengo.Connection(ramp_node, net.ramp)

# ------------------- Forcing Functions ---------------
def relay_func(t, x):
t = time_func(t)
if t <= -1:
return [0, 0]
return x
# the relay prevents forces from being sent on reset
relay = nengo.Node(output=relay_func, size_in=2)

domain = np.linspace(-1, 1, len(forces[0]))
x_func = interpolate.interp1d(domain, forces[0])
y_func = interpolate.interp1d(domain, forces[1])
transform=1.0/alpha/beta
nengo.Connection(net.ramp, relay[0], transform=transform,
function=x_func, synapse=alpha_synapse)
nengo.Connection(net.ramp, relay[1], transform=transform,
function=y_func, synapse=alpha_synapse)
nengo.Connection(relay, net.output)

nengo.Connection(net.goal[0], net.output[0],
synapse=alpha_synapse)
nengo.Connection(net.goal[1], net.output[1],
synapse=alpha_synapse)
return net


The generate method for the PMC takes in a desired trajectory, y_des, as a parameter. The first thing we do (on lines 5-6) is calculate the forcing function that will push the DMP point attractor along the desired trajectory.

The next thing (on lines 9-10) is creating an Alpha (second-order low-pass filter) synapse. By writing out the dynamics of a point attractor in Laplace space, one of the lab members, Aaron Voelker, noticed that the dynamics could be fully implemented by creating an Alpha synapse with an appropriate choice of tau. I walk through all of the math behind this in this post. Here we’ll use that more compact method and project directly to the output node, which improves performance and reduces the number of neurons.

Inside the PMC network we create a time_func node, which is the pace-setter during simulation. It will output a linear ramp from -1 to 1 every few seconds, with the pace set by the speed parameter, and then pause before restarting.

We also have a goal node, which will provide a target starting and ending point for the trajectory. Both the time_func and goal nodes are created and used as a model simulation convenience, and proposed to be generated elsewhere in the brain (the basal ganglia, why not? #igotreasons #provemewrong).

The ramp ensemble is the most important component of the trajectory generation system. It takes the output from the time_func node as input, and generates the forcing function which will guide our little system through the trajectory that was passed in. The ensemble itself is nothing special, but rather the function that it approximates on its outgoing connection. We set up this function approximation with the following code:

        domain = np.linspace(-1, 1, len(forces[0]))
x_func = interpolate.interp1d(domain, forces[0])
y_func = interpolate.interp1d(domain, forces[1])
transform=1.0/alpha/beta
nengo.Connection(net.ramp, relay[0], transform=transform,
function=x_func, synapse=alpha_synapse)
nengo.Connection(net.ramp, relay[1], transform=transform,
function=y_func, synapse=alpha_synapse)
nengo.Connection(relay, net.output)


We want the forcing function be generated as the signals represented in the ramp ensemble moves from -1 to 1. To achieve this, we create interpolation functions, x_func and y_func, which are set up to generate the forcing function values mapped to input values between -1 and 1. We pass these functions into the outgoing connections from the ramp population (one for x and one for y). So now when the ramp ensemble is representing -1, 0, and 1 the output along the two connections will be the starting, middle, and ending x and y points of the forcing function trajectory. The transform and synapse are set on each connection with the appropriate gain values and Alpha synapse, respectively, to implement point attractor dynamics.

NOTE: The above DMP implementation can generate a trajectory signal with as many dimensions as you would like, and all that’s required is adding another outgoing Connection projecting from the ramp ensemble.

The last thing in the code is hooking up the goal node to the output, which completes the point attractor implementation.

Implementing M1 – the kinematics of operational space control

In REACH, we’ve modelled the primary motor cortex (M1) as responsible for taking in a desired hand movement (i.e. target_position - current_hand_position) and calculating a set of joint torques to carry that out. Explicitly, M1 generates the kinematics component of an OSC signal:

$\textbf{u}_\textrm{M1} = \textbf{J}^T \textbf{M}_\textbf{x} (k_p (\textbf{x}^* - \textbf{x}))$

In the paper I did this using several populations of neurons, one to calculate the Jacobian, and then an EnsembleArray to perform the multiplication for the dot product of each dimension separately. Since then I’ve had the chance to rework things and it’s now done entirely in one ensemble.

Now, the set up for the M1 model that computes the above function is to have a single ensemble of neurons that takes in the joint angles, $\textbf{q}$, and control signal $\textbf{u}_\textbf{x} = k_p (\textbf{x}^* - \textbf{x})$, and outputs the function above. Sounds pretty simple, right? Simple is good.

Let’s look at the code (where I’ve stripped out the comments and some debugging code):

def generate(arm, kp=1, operational_space=True,
inertia_compensation=True, means=None, scales=None):

dim = arm.DOF + 2

means = np.zeros(dim) if means is None else means
scales = np.ones(dim) if scales is None else scales
scale_down, scale_up = generate_scaling_functions(
np.asarray(means), np.asarray(scales))

net = nengo.Network('M1')
with net:
# create / connect up M1 ------------------------------
net.M1 = nengo.Ensemble(
n_neurons=1000, dimensions=dim,
intercepts=AreaIntercepts(
dimensions=dim,
base=nengo.dists.Uniform(-1, .1)))

# expecting input in form [q, x_des]
net.input = nengo.Node(output=scale_down, size_in=dim)
net.output = nengo.Node(size_in=arm.DOF)

def M1_func(x, operational_space, inertia_compensation):
""" calculate the kinematics of the OSC signal """
x = scale_up(x)
q = x[:arm.DOF]
x_des = kp * x[arm.DOF:]

# calculate hand (dx, dy)
if operational_space:
J = arm.J(q=q)

if inertia_compensation:
# account for inertia
Mx = arm.Mx(q=q)
x_des = np.dot(Mx, x_des)
# transform to joint torques
u = np.dot(J.T, x_des)
else:
u = x_des

if inertia_compensation:
# account for mass
M = arm.M(q=q)
u = np.dot(M, x_des)

return u

# send in system feedback and target information
# don't account for synapses twice
nengo.Connection(net.input, net.M1, synapse=None)
nengo.Connection(
net.M1, net.output,
function=lambda x: M1_func(
x, operational_space, inertia_compensation),
synapse=None)

return net


The ensemble of neurons itself is created with a few interesting parameters:

        net.M1 = nengo.Ensemble(
n_neurons=1000, dimensions=dim,
intercepts=AreaIntercepts(
dimensions=dim, base=nengo.dists.Uniform(-1, .1)))


Specifically, the radius and intercepts parameters.

Setting the intercepts

First we’ll discuss setting the intercepts using the AreaIntercepts distribution. The intercepts of a neuron determine how much of state space a neuron is active in, which we’ll refer to as the ‘active proportion’. If you don’t know what kind of functions you want to approximate with your neurons, then you having the active proportions for your ensemble chosen from a uniform distribution is a good starting point. This means that you’ll have roughly the same number of neurons active across all of state space as you do neurons that are active over half of state space as you do neurons that are active over very small portions of state space.

By default, Nengo sets the intercepts such that the distribution of active proportions is uniform for lower dimensional spaces. But when you start moving into higher dimensional spaces (into a hypersphere) the default method breaks down and you get mostly neurons that are either active for all of state space or almost none of state space. The AreaIntercepts class calculates how the intercepts should be set to achieve the desire active proportion inside a hypersphere. There are a lot more details here that you can read up on in this IPython notebook by Dr. Terrence C. Stewart.

What you need to know right now is that we’re setting the intercepts of the neurons such that the percentage of state space for which any given neuron is active is chosen from a uniform distribution between 0% and 55%. In other words, a neuron will maximally be active in 55% of state space, no more. This will let us model more nonlinear functions (such as the kinematics of the OSC signal) with fewer neurons. If this description is clear as mud, I really recommend checking out the IPython notebook linked above to get an intuition for what I’m talking about.

Scaling the input signal

The other parameter we set on the M1 ensemble is the radius. The radius scales the range of input values that the ensemble can represent, which is by default anything inside the unit hypersphere (i.e. hypersphere with radius=1). If the radius is left at this default value, the neural activity will saturate for vectors with magnitude greater than 1, leading to inaccurate vector representation and function approximation for input vectors with magnitude > 1. For lower dimensions this isn’t a terrible problem, but as the dimensions of the state space you’re representing grow it becomes more common for input vectors to have a norm greater than 1. Typically, we’d like to be able to, at a minimum, represent vectors with any number of dimensions where any element can be anywhere between -1 and 1. To do this, all we have to do is calculate the norm of the unit vector size dim, which is np.sqrt(dim) (the magnitude of a vector size dim with all elements set to one).

Now that we’re able to represent vectors where the input values are all between -1 and 1, the last part of this sub-network is scaling the input to be between -1 and 1. We use two scaling functions, scale_down and scale_up. The scale_down function subtracts a mean value and scales the input signal to be between -1 and 1. The scale_up function reverts the vector back to it’s original values so that calculations can be carried out normally on the decoding. In choosing mean and scaling values, there are two ways we can set these functions up:

1. Set them generally, based on the upper and lower bounds of the input signal. For M1, the input is $[\textbf{q}, \textbf{u}_\textbf{x}]$ where $\textbf{u}_\textbf{x}$ is the control signal in end-effector space, we know that the joint angles are always in the range 0 to $\pi$ (because that’s how the arm simulation is programmed), so we can set the means and scales to be $\frac{\pi}{2}$ for $\textbf{q}$. For $\textbf{u}$ a mean of zero is reasonable, and we can choose (arbitrarily, empirically, or analytically) the largest task space control signal we want to represent accurately.
2. Or, if we know the model will be performing a specific task, we can look at the range of input values encountered during that task and set the means and scales terms appropriately. For the task of reaching in a straight line, the arm moves in a very limited state space and we can set the mean and we can tune these parameter to be very specific:
                             means=[0.6, 2.2, 0, 0],
scales=[.25, .25, .25, .25]


The benefit of the second method, although one can argue it’s parameter tuning and makes things less biologically plausible, is that it lets us run simulations with fewer neurons. The first method works for all of state space, given enough neurons, but seeing as we don’t always want to be simulating 100k+ neurons we’re using the second method here. By tuning the scaling functions more specifically we’re able to run our model using 1k neurons (and could probably get away with fewer). It’s important to keep in mind though that if the arm moves outside the expected range the control will become unstable.

Implementing CB – the dynamics of operational space control

The cerebellum (CB) sub-network has two components to it: dynamics compensation and dynamics adaptation. First we’ll discuss the dynamics compensation. By which I mean the $k_v \textbf{M} (\textbf{q}^* - \textbf{q})$ term from the OSC signal.

Much like the calculating the kinematics term of the OSC signal in M1, we can calculate the entire dynamics compensation term using a single ensemble with an appropriate radius, scaled inputs, and well chosen intercepts.

def generate(arm, kv=1, learning_rate=None, learned_weights=None,
means=None, scales=None):
dim = arm.DOF * 2

means = np.zeros(dim) if means is None else means
scales = np.ones(dim) if scales is None else scales
scale_down, scale_up = generate_scaling_functions(
np.asarray(means), np.asarray(scales))

net = nengo.Network('CB')
with net:
# create / connect up CB --------------------------------
net.CB = nengo.Ensemble(
n_neurons=1000, dimensions=dim,
intercepts=AreaIntercepts(
dimensions=dim,
base=nengo.dists.Uniform(-1, .1)))
# expecting input in form [q, dq, u]
net.input = nengo.Node(output=scale_down,
size_in=dim+arm.DOF+2)
cb_input = nengo.Node(size_in=dim, label='CB input')
nengo.Connection(net.input[:dim], cb_input)
net.output = nengo.Node(size_in=arm.DOF*2)

def CB_func(x):
""" calculate the dynamic component of OSC signal """
x = scale_up(x)
q = x[:arm.DOF]
dq = x[arm.DOF:arm.DOF*2]

# calculate inertia matrix
M = arm.M(q=q)
return -np.dot(M, kv * dq)
# connect up the input and output
nengo.Connection(net.input[:dim], net.CB)
nengo.Connection(net.CB, net.output[:arm.DOF],
function=CB_func, synapse=None)


I don’t think there’s anything noteworthy going on here, most of the relevant details have already been discussed…so we’ll move on to the adaptation!

Implementing CB – non-linear dynamics adaptation

The final part of the model is the non-linear dynamics adaptation, modelled as a separate ensemble in the cerebellar sub-network (a separate ensemble so that it’s more modular, the learning connection could also come off of the other CB population). I work through the details and proof of the learning rule in the paper, so I’m not going to discuss that here. But I will restate the learning rule here:

$\dot{\textbf{d}} = \textbf{L}_d \textbf{A} \otimes \textbf{u},$

where $\textbf{d}$ are the neural decoders, $\textbf{L}_d$ is the learning rate, $\textbf{A}$ is the neural activity of the ensemble, and $\textbf{u}$ is the joint space control signal sent to the arm. This is a basic delta learning rule, where the decoders of the active neurons are modified to push the decoded function in a direction that reduces the error.

The adaptive ensemble can be initialized either using saved weights (passed in with the learned_weights paramater) or as all zeros. It is important to note that setting decoders to all zeros means that does not mean having zero neural activity, so learning will not be affected by this initialization.

        # dynamics adaptation------------------------------------
if learning_rate is not None:
n_neurons=1000, dimensions=arm.DOF*2,
# enforce spiking neurons
neuron_type=nengo.LIF(),
intercepts=AreaIntercepts(
dimensions=arm.DOF,
base=nengo.dists.Uniform(-.5, .2)))

net.learn_encoders = nengo.Connection(

# if no saved weights were passed in start from zero
weights = (
learned_weights if learned_weights is not None
net.learn_conn = nengo.Connection(
# connect directly to arm so that adaptive signal
# is not included in the training signal
transform=weights,
learning_rule_type=nengo.PES(
learning_rate=learning_rate),
synapse=None)

nengo.Connection(net.input[dim:dim+2],
net.learn_conn.learning_rule,
transform=-1, synapse=.01)
return net


We’re able to implement that learning rule using Nengo’s prescribed error-sensitivity (PES) learning on our connection from the adaptive ensemble to the output. With this set up the system will be able to learn to adapt to perturbations that are functions of the input (set here to be $[\textbf{q}, \dot{\textbf{q}}]$).

The intercepts in this population are set to values I found worked well for adapting to a few different forces, but it’s definitely a parameter to play with in your own scripts if you’re finding that there’s too much or not enough generalization of the decoded function output across the state space.

One other thing to mention is that we need to have a relay node to amalgamate the control signals output from M1 and the dynamics compensation ensemble in the CB. This signal is used to train the adaptive ensemble, and it’s important that the adaptive ensemble’s output is not included in the training signal, or else the system quickly goes off to positive or negative infinity.

Implementing S1 – a placeholder

The last sub-network in the REACH model is a placeholder for a primary sensory cortex (S1) model. This is just a set of ensembles that represents the feedback from the arm and relay it on to the rest of the model.

def generate(arm, direct_mode=False, means=None, scales=None):
dim = arm.DOF*2 + 2  # represents [q, dq, hand_xy]

means = np.zeros(dim) if means is None else means
scales = np.ones(dim) if scales is None else scales
scale_down, scale_up = generate_scaling_functions(
np.asarray(means), np.asarray(scales))

net = nengo.Network('S1')
with net:
# create / connect up S1 --------------------------------
net.S1 = nengo.networks.EnsembleArray(
n_neurons=50, n_ensembles=dim)

# expecting input in form [q, x_des]
net.input = nengo.Node(output=scale_down, size_in=dim)
net.output = nengo.Node(
lambda t, x: scale_up(x), size_in=dim)

# send in system feedback and target information
# don't account for synapses twice
nengo.Connection(net.input, net.S1.input, synapse=None)
nengo.Connection(net.S1.output, net.output, synapse=None)

return net


Since there’s no function that we’re decoding off of the represented variables we can use separate ensembles to represent each dimension with an EnsembleArray. If we were going to decode some function of, for example, q0 and dq0, then we would need an ensemble that represents both variables. But since we’re just decoding out f(x) = x, using an EnsembleArray is a convenient way to decrease the number of neurons needed to accurately represent the input.

Creating a model using the framework

The REACH model has been set up to be as much of a plug and play system as possible. To generate a model you first create the M1, PMC, CB, and S1 networks, and then they’re all hooked up to each other using the framework.py file. Here’s an example script that controls the arm to trace a circle:

def generate():
kp = 200
kv = np.sqrt(kp) * 1.5

center = np.array([0, 1.25])
# set the initial position of the arm
arm_sim.init_q = arm_sim.inv_kinematics(center)
arm_sim.reset()

net = nengo.Network(seed=0)
with net:
net.dim = arm_sim.DOF
net.arm_node = arm_sim.create_nengo_node()
net.error = nengo.Ensemble(1000, 2)
net.xy = nengo.Node(size_in=2)

# create an M1 model -------------------------------------
net.M1 = M1.generate(arm_sim, kp=kp,
operational_space=True,
inertia_compensation=True,
means=[0.6, 2.2, 0, 0],
scales=[.5, .5, .25, .25])

# create an S1 model -------------------------------------
net.S1 = S1.generate(arm_sim,
means=[.6, 2.2, -.5, 0, 0, 1.25],
scales=[.5, .5, 1.7, 1.5, .75, .75])

# subtract current position to get task space direction
nengo.Connection(net.S1.output[net.dim*2:], net.error,
transform=-1)

# create a trajectory for the hand to follow -------------
x = np.linspace(0.0, 2.0*np.pi, 100)
PMC_trajectory = np.vstack([np.cos(x) * .5,
np.sin(x) * .5])
PMC_trajectory += center[:, None]
# create / connect up PMC --------------------------------
net.PMC = PMC.generate(PMC_trajectory, speed=1)
# send target for calculating control signal
nengo.Connection(net.PMC.output, net.error)
# send target (x,y) for plotting
nengo.Connection(net.PMC.output, net.xy)

net.CB = CB.generate(arm_sim, kv=kv,
means=[0.6, 2.2, -.5, 0],
scales=[.5, .5, 1.6, 1.5])

model = framework.generate(net=net, probes_on=True)
return model


In line 50 you can see the call to the framework code, which will hook up the most common connections that don’t vary between the different scripts.

The REACH model has assigned functionality to each area / sub-network, and you can see the expected input / output in the comments at the top of each sub-network file, but the implementations are open. You can create your own M1, PMC, CB, or S1 sub-networks and try them out in the context of a full model that generates high-level movement behaviour.

Running the model

To run the model you’ll need Nengo, Nengo GUI, and NengoLib all installed. You can then pull open Nengo GUI and load any of the a# scripts. In all of these scripts the S1 model is just an ensemble that represents the output from the arm_node. Here’s what each of the scripts does:

1. a01 has a spiking M1 and CB, dynamics adaptation turned off. The model guides the arm in reaching in a straight line to a single target and back.
2. a02 has a spiking M1, PMC, and CB, dynamics adaptation turned off. The PMC generates a path for the hand to follow that traces out a circle.
3. a03 has a spiking M1, PMC, and CB, dynamics adaptation turned off. The PMC generates a path for the joints to follow, which moves the hand in a straight line to a target and back.
4. a04 has a spiking M1 and CB, dynamics adaptation turned off. The model performs the centre-out reaching task, starting at a central point and reaching to 8 points around a circle.
5. a05 has a spiking M1 and CB, and dynamics adaptation turned on. The model performs the centre-out reaching task, starting at a central point and reaching to 8 points around a circle. As the model reaches, a forcefield is applied based on the joint velocities that pushes the arm as it tries to reach the target. After 100-150 seconds of simulation the arm has adapted and learned to reach in a straight line again.

Here’s what it looks like when you pull open a02 in Nengo GUI:

I’m not going to win any awards for arm animation, but! It’s still a useful visualization, and if anyone is proficient in javascript and want’s to improve it, please do! You can see the network architecture in the top left, the spikes generated by M1 and CB to the right of that, the arm in the bottom left, and the path traced out by the hand just to the right of that. On the top right you can see the a02 script code, and below that the Nengo console.

Conclusions

One of the most immediate extensions (aside from any sort of model of S1) that comes to mind here is implementing a more detailed cerebellar model, as there are several anatomically detailed models which have the same supervised learner functionality (for example (Yamazaki and Nagao, 2012)).

Hopefully people find this post and the code useful for their own work, or at least interesting! In the ideal scenario this would be a community project, where researchers add models of different brain areas and we end up with a large library of implementations to build larger models with in a Mr. Potato Head kind of fashion.

You can find all of the code up on my GitHub. And again, this code all should have been publicly available along with the publication. Hopefully the code proves useful! If you have any questions about it please don’t hesitate to make a comment here or contact me through email, and I’ll get back to you as soon as I can.

## Full body obstacle collision avoidance

Previously I’ve discussed how to avoid obstacles using DMPs in the end-effector trajectory. This is good when you’re controlling a single disconnected point-mass, like a mobile robot navigating around an environment. But if you want to use this system to control a robotic manipulator, then pretty quickly you run into the problem that your end-effector is not a disconnected point-mass moving around the environment. It’s attached to the rest of the arm, and moving such that the arm segments and joints also avoid the obstacle is a whole other deal.

I was doing a quick lit scan on methods for control methods for avoiding joint collision with obstacles, and was kind of surprised that there wasn’t really much in the realm of recent discussions about it. There is, however, a 1986 paper from Dr. Oussama Khatib titled Real-time obstacle avoidance for manipulators and mobile robots that pretty much solves this problem short of getting into explicit path planning methods. Which could be why there aren’t papers since then about it. All the same, I couldn’t find any implementations of it online, and there are a few tricky parts, so in this post I’m going to work through the implementation.

Note: The implementation that I’ve worked out here uses spheres to represent the obstacles. This works pretty well in general by just making the sphere large enough to cover whatever obstacle you’re avoiding. But if you want a more precise control around other shapes, you can check out the appendix of Dr. Khatib’s paper, where he works provides the math for cubes and cones as well.

Note: You can find all of the code I use here and everything you need for the VREP implementation up on my GitHub. I’ve precalculated the functions that are required, because generating them is quite computationally intensive. Hopefully the file saving doesn’t get weird between different operating systems, but if it does, try deleting all of the files in the ur5_config folder and running the code again. This will regenerate those files (on my laptop this takes ~4 hours, though, so beware).

The general idea

Since it’s from Dr. Khatib, you might expect that this approach involves task space. And indeed, your possible suspicions are correct! The algorithm is going to work by identifying forces in Cartesian coordinates that will move any point of the arm that’s too close to an obstacle away from it. The algorithm follows the form:

Setup

• Specify obstacle location and size
• Specify a threshold distance to the obstacle

While running

• Find the closest point of each arm segment to obstacles
• If within threshold of obstacle, generate force to apply to that point
• Transform this force into joint torques
• Add directly to the outgoing control signal

Something that you might notice about this is that it’s similar to the addition of the null space controller that we’ve seen before in operational space control. There’s a distinct difference here though, in that the control signal for avoiding obstacles is added directly to the outgoing control signal, and that it’s not filtered (like the null space control signal) such that there’s no guarantee that it won’t affect the movement of the end-effector. In fact, it’s very likely to affect the movement of the end-effector, but that’s also desirable, as not ramming the arm through an obstacle is as important as getting to the target.

OK, let’s walk through these steps one at a time.

Setup

I mentioned that we’re going to treat all of our obstacles as spheres. It’s actually not much harder to do these calculations for cubes too, but this is already going to be long enough so I’m only addressing sphere’s here. This algorithm assumes we have a list of every obstacle’s centre-point and radius.

We want the avoidance response of the system to be a function of the distance to the obstacle, such that the closer the arm is to the obstacle the stronger the response. The function that Dr. Khatib provides is of the following form:

$\textbf{F}_{psp} = \left\{ \begin{array}{cc}\eta (\frac{1.0}{\rho} - \frac{1}{\rho_0}) \frac{1}{\rho^2} \frac{\partial \rho}{\partial \textbf{x}} & \rho \leq \rho_0 \\ \textbf{0} & \rho > \rho_0 \end{array} \right. ,$

where $\rho$ is the distance to the target, $\rho_0$ is the threshold distance to the target at which point the avoidance function activates, $\frac{\partial \rho}{\partial \textbf{x}}$ is the partial derivative of the distance to the target with respect to a given point on the arm, and $\eta$ is a gain term.

This function looks complicated, but it’s actually pretty intuitive. The partial derivative term in the function works simply to point in the opposite direction of the obstacle, in Cartesian coordinates, i.e. tells the system how to get away from the obstacle. The rest of the term is just a gain that starts out at zero when $\rho = \rho_0$, and gets huge as the obstacle nears the object (as $\rho \to 0 \Rightarrow \frac{1}{\rho} \to \infty$). Using $\eta = .2$ and $\rho_0 = .2$ gives us a function that looks like

So you can see that very quickly a very, very strong push away from this obstacle is going to be generated once we enter the threshold distance. But how do we know exactly when we’ve entered the threshold distance?

Find the closest point

We want to avoid the obstacle with our whole body, but it turns out we can reduce the problem to only worrying about the closest point of each arm segment to the obstacle, and move that one point away from the obstacle if threshold distance is hit.

To find the closest point on a given arm segment to the obstacle we’ll do some pretty neat trig. I’ll post the code for it and then discuss it below. In this snippet, p1 and p2 are the beginning and ending $(x,y,z)$ locations of arm segment (which we are assuming is a straight line), and v is the center of the obstacle.

# the vector of our line
vec_line = p2 - p1
# the vector from the obstacle to the first line point
vec_ob_line = v - p1
# calculate the projection normalized by length of arm segment
projection = (np.dot(vec_ob_line, vec_line) /
np.dot(vec_line, vec_line))
if projection < 0:
# then closest point is the start of the segment
closest = p1
elif projection > 1:
# then closest point is the end of the segment
closest = p2
else:
closest = p1 + projection * vec_line


The first thing we do is find the arm segment line, and then line from the obstacle center to the start point of the arm segment. Once we have these, we do:

$\frac{\textbf{v}_\textrm{ob\_line} \; \cdot \; \textbf{v}_\textrm{line}}{\textbf{v}_\textrm{line} \; \cdot \; \textbf{v}_\textrm{line}},$

using the geometric definition of the dot product two vectors, we can rewrite the above as

$\frac{||\textbf{v}_\textrm{ob\_line}|| \; || \textbf{v}_\textrm{line} || \; \textrm{cos}(\theta)}{||\textbf{v}_\textrm{line}||^2} = \frac{||\textbf{v}_\textrm{ob\_line}||} {||\textbf{v}_\textrm{line}||} \textrm{cos}(\theta)$

which reads as the magnitude of vec_ob_line divided by the magnitude of vec_line (I know, these are terrible names, sorry) multiplied by the angle between the two vectors. If the angle between the vectors is < 0 (projection will also be < 0), then right off the bat we know that the start of the arm segment, p1, is the closest point. If the projection value is > 1, then we know that 1) the length from the start of the arm segment to the obstacle is longer than the length of the arm, and 2) the angle is such that the end of the arm segment, p2, is the closest point to the obstacle.

Finally, in the last case we know that the closest point is somewhere along the arm segment. To find where exactly, we do the following

$\textbf{p}_1 + \textrm{projection} \; \textbf{v}_\textrm{line},$

which can be rewritten

$\textbf{p}_1 + \frac{||\textbf{v}_\textrm{ob\_line}||} {||\textbf{v}_\textrm{line}||} \textrm{cos}(\theta) \; \textbf{v}_\textrm{line},$

I find it more intuitive if we rearrange the second term to be

$\textbf{p}_1 + \frac{\textbf{v}_\textrm{line}} {||\textbf{v}_\textrm{line}||} \; ||\textbf{v}_\textrm{ob\_line} || \; \textrm{cos}(\theta).$

So then what this is all doing is starting us off at the beginning of the arm segment, p1, and to that we add this other fun term. The first part of this fun term provides direction normalized to magnitude 1. The second part of this term provides magnitude, specifically the exact distance along vec_line we need to traverse to form reach a right angle (giving us the shortest distance) pointing at the obstacle. This handy figure from the Wikipedia page helps illustrate exactly what’s going on with the second part, where B is be vec_line and A is vec_ob_line:

Armed with this information, we understand how to find the closest point of each arm segment to the obstacle, and we are now ready to generate a force to move the arm in the opposite direction!

Check distance, generate force

To calculate the distance, all we have to do is calculate the Euclidean distance from the closest point of the arm segment to the center of the sphere, and then subtract out the radius of the sphere:

# calculate distance from obstacle vertex to the closest point
dist = np.sqrt(np.sum((v - closest)**2))
# account for size of obstacle
rho = dist - obstacle[3]


Once we have this, we can check it and generate $F_{psp}$ using the equation we defined above. The one part of that equation that wasn’t specified was exactly what $\frac{\partial \rho}{\partial \textbf{x}}$ was. Since it’s just the partial derivative of the distance to the target with respect to the closest point, we can calculate it as the normalized difference between the two points:

drhodx = (v - closest) / rho


Alright! Now we’ve found the closest point, and know the force we want to apply, from here it’s standard operational space procedure.

Transform the force into torques

As we all remember, the equation for transforming a control signal from operational space to involves two terms aside from the desired force. Namely, the Jacobian and the operational space inertia matrix:

$\textbf{u}_\textrm{psp} = \textbf{J}^T_{psp} \textbf{M}_{psp} \textbf{F}_{psp},$

where $\textbf{J}_{psp}$ is the Jacobian for the point of interest, $\textbf{M}_{psp}$ is the operational space inertia matrix for the point of interest, and $\textbf{F}_{psp}$ is the force we defined above.

Calculating the Jacobian for an unspecified point

So the first thing we need to calculate is the Jacobian for this point on the arm. There are a bunch of ways you could go about this, but the way I’m going to do it here is by building on the post where I used SymPy to automate the Jacobian derivation. The way we did that was by defining the transforms from the origin reference frame to the first link, from the first link to the second, etc, until we reached the end-effector. Then, whenever we needed a Jacobian we could string together the transforms to get the transform from the origin to that point on the arm, and take the partial derivative with respect to the joints (using SymPy’s derivative method).

As an example, say we wanted to calculate the Jacobian for the third joint, we would first calculate:

$^3_{\textrm{org}}\textbf{T} = ^0_{\textrm{org}}\textbf{T} \; ^1_0\textbf{T} \; ^2_1\textbf{T} \; ^3_2\textbf{T},$

where $^m_n\textbf{T}$ reads the transform from reference frame $m$ to reference frame $n$.

Once we have this transformation matrix, $^3_\textrm{org}\textbf{T}$, we multiply it by the point of interest in reference frame 3, which, previously, has usually been $\textbf{x} = [0, 0, 0]$. In other words, usually we’re just interested in the origin of reference frame 3. So the Jacobian is just

$\frac{\partial \; ^3_\textrm{org}\textbf{T} \textbf{x}}{\partial \textbf{q}}.$

what if we’re interested in some non-specified point along link 3, though? Well, using SymPy we set make $\textbf{x} = [x_0, x_1, x_2, 1]$ instead of $\textbf{x} = [0, 0, 0, 1]$ (recall the 1 at the end in these vectors is just to make the math work), and make the Jacobian function SymPy generates for us dependent on both $\textbf{q}$ and $\textbf{x}$, rather than just $\textbf{q}$. In code this looks like:

Torg3 = self._calc_T(name="3")
# transform x into world coordinates
Torg3x = sp.simplify(Torg3 * sp.Matrix(x))
J3_func = sp.lambdify(q + x, Torg3)


Now it’s possible to calculate the Jacobian for any point along link 3 just by changing the parameters that we pass into J3_func! Most excellent.

We are getting closer.

NOTE: This parameterization can significantly increase the build time of the function, it took my laptop about 4 hours. To decrease build time you can try commenting out the simplify calls from the code, which might slow down run-time a bit but significantly drops the generation time.

Where is the closest point in that link’s reference frame?

A sneaky problem comes up when calculating the closest point of each arm segment to the object: We’ve calculated the closest point of each arm segment in the origin’s frame of reference, and we need thew relative to each link’s own frame of reference. Fortunately, all we have to do is calculate the inverse transform for the link of interest. For example, the inverse transform of $^3_\textrm{org}\textbf{T}$ transforms a point from the origin’s frame of reference to the reference frame of the 3rd joint.

I go over how to calculate the inverse transform at the end of my post on forward transformation matrices, but to save you from having to go back and look through that, here’s the code to do it:

Torg3 = self._calc_T(name="3")
rotation_inv = Torg3[:3, :3].T
translation_inv = -rotation_inv * Torg3[:3, 3]
Torg3_inv = rotation_inv.row_join(translation_inv).col_join(
sp.Matrix([[0, 0, 0, 1]]))


And now to find the closest point in the coordinates of reference frame 3 we simply

x = np.dot(Torg3_inv, closest)


This x value is what we’re going to plug in as parameters to our J3_func above to find the Jacobian for the closest point on link 3.

Calculate the operational space inertia matrix for the closest point

OK. With the Jacobian for the point of interest we are now able to calculate the operational space inertia matrix. This code I’ve explicitly worked through before, and I’ll show it in the full code below, so I won’t go over it again here.

The whole implementation

You can run an example of all this code controlling the UR5 arm to avoid obstacles in VREP using this code up on my GitHub. The specific code added to implement obstacle avoidance looks like this:

# find the closest point of each link to the obstacle
for ii in range(robot_config.num_joints):
# get the start and end-points of the arm segment
p1 = robot_config.Tx('joint%i' % ii, q=q)
if ii == robot_config.num_joints - 1:
p2 = robot_config.Tx('EE', q=q)
else:
p2 = robot_config.Tx('joint%i' % (ii + 1), q=q)

# calculate minimum distance from arm segment to obstacle
# the vector of our line
vec_line = p2 - p1
# the vector from the obstacle to the first line point
vec_ob_line = v - p1
# calculate the projection normalized by length of arm segment
projection = (np.dot(vec_ob_line, vec_line) /
np.sum((vec_line)**2))
if projection < 0:
# then closest point is the start of the segment
closest = p1
elif projection > 1:
# then closest point is the end of the segment
closest = p2
else:
closest = p1 + projection * vec_line
# calculate distance from obstacle vertex to the closest point
dist = np.sqrt(np.sum((v - closest)**2))
# account for size of obstacle

if rho < threshold:
eta = .02
drhodx = (v - closest) / rho
Fpsp = (eta * (1.0/rho - 1.0/threshold) *
1.0/rho**2 * drhodx)

# get offset of closest point from link's reference frame
T_inv = robot_config.T_inv('link%i' % ii, q=q)
m = np.dot(T_inv, np.hstack([closest, [1]]))[:-1]
# calculate the Jacobian for this point
Jpsp = robot_config.J('link%i' % ii, x=m, q=q)[:3]

# calculate the inertia matrix for the
# point subjected to the potential space
Mxpsp_inv = np.dot(Jpsp,
np.dot(np.linalg.pinv(Mq), Jpsp.T))
svd_u, svd_s, svd_v = np.linalg.svd(Mxpsp_inv)
# cut off singular values that could cause problems
singularity_thresh = .00025
for ii in range(len(svd_s)):
svd_s[ii] = 0 if svd_s[ii] < singularity_thresh else \
1./float(svd_s[ii])
# numpy returns U,S,V.T, so have to transpose both here
Mxpsp = np.dot(svd_v.T, np.dot(np.diag(svd_s), svd_u.T))

u_psp = -np.dot(Jpsp.T, np.dot(Mxpsp, Fpsp))
if rho < .01:
u = u_psp
else:
u += u_psp


The one thing in this code I didn’t talk about is that you can see that if rho < .01 then I set u = u_psp instead of just adding u_psp to u. What this does is basically add in a fail safe take over of the robotic control saying that “if we are about to hit the obstacle forget about everything else and get out of the way!”.

Results

And that’s it! I really enjoy how this looks when it’s running, it’s a really effective algorithm. Let’s look at some samples of it in action.

First, in a 2D environment, where it’s real easy to move around the obstacle and see how it changes in response to the new obstacle position. The red circle is the target and the blue circle is the obstacle:

And in 3D in VREP, running the code example that I’ve put up on my GitHub implementing this. The example of it running without obstacle avoidance code is on the left, and running with obstacle avoidance is on the right. It’s kind of hard to see but on the left the robot moves through the far side of the obstacle (the gold sphere) on its way to the target (the red sphere):

And one last example, the arm dodging a moving obstacle on its way to the target.

The implementation is a ton of fun to play around with. It’s a really nifty algorithm, that works quite well, and I haven’t found many alternatives in papers that don’t go into path planning (if you know of some and can share that’d be great!). This post was a bit of a journey, but hopefully you found it useful! I continue to find it impressive how many different neat features like this can come about once you have the operational space control framework in place.

## Dynamic movement primitives part 4: Avoiding obstacles – update

Edit: Previously I posted this blog post on incorporating obstacle avoidance, but after a recent comment on the code I started going through it again and found a whole bunch of issues. Enough so that I’ve recoded things and I’m going to repost it now with working examples and a description of the changes I made to get it going. The edited sections will be separated out with these nice horizontal lines. If you’re just looking for the example code, you can find it up on my pydmps repo, here.

Alright. Previously I’d mentioned in one of these posts that DMPs are awesome for generalization and extension, and one of the ways that they can be extended is by incorporating obstacle avoidance dynamics. Recently I wanted to implement these dynamics, and after a bit of finagling I got it working, and so that’s going to be the subject of this post.

There are a few papers that talk about this, but the one we’re going to use is Biologically-inspired dynamical systems for movement generation: automatic real-time goal adaptation and obstacle avoidance by Hoffmann and others from Stefan Schaal’s lab. This is actually the second paper talking about obstacle avoidance and DMPs, and this is a good chance to stress one of the most important rules of implementing algorithms discussed in papers: collect at least 2-3 papers detailing the algorithm (if possible) before attempting to implement it. There are several reasons for this, the first and most important is that there are likely some typos in the equations of one paper, by comparing across a few papers it’s easier to identify trickier parts, after which thinking through what the correct form should be is usually straightforward. Secondly, often equations are updated with simplified notation or dynamics in later papers, and you can save yourself a lot of headaches in trying to understand them just by reading a later iteration. I recklessly disregarded this advice and started implementation using a single, earlier paper which had a few key typos in the equations and spent a lot of time tracking down the problem. This is just a peril inherent in any paper that doesn’t provide tested code, which is almost all, sadface.

OK, now on to the dynamics. Fortunately, I can just reference the previous posts on DMPs here and don’t have to spend any time discussing how we arrive at the DMP dynamics (for a 2D system):

$\ddot{\textbf{y}} = \alpha_y (\beta_y( \textbf{g} - \textbf{y}) - \dot{\textbf{y}}) + \textbf{f},$

where $\alpha_y$ and $\beta_y$ are gain terms, $\textbf{g}$ is the goal state, $\textbf{y}$ is the system state, $\dot{\textbf{y}}$ is the system velocity, and $\textbf{f}$ is the forcing function.
As mentioned, DMPs are awesome because now to add obstacle avoidance all we have to do is add another term

$\ddot{\textbf{y}} = \alpha_y (\beta_y( \textbf{g} - \textbf{y}) - \dot{\textbf{y}}) + \textbf{f} + \textbf{p}(\textbf{y}, \dot{\textbf{y}}),$

where $\textbf{p}(\textbf{y}, \dot{\textbf{y}})$ implements the obstacle avoidance dynamics, and is a function of the DMP state and velocity. Now then, the question is what are these dynamics exactly?

Obstacle avoidance dynamics

It turns out that there is a paper by Fajen and Warren that details an algorithm that mimics human obstacle avoidance. The idea is that you calculate the angle between your current velocity and the direction to the obstacle, and then turn away from the obstacle. The angle between current velocity and direction to the obstacle is referred to as the steering angle, denoted $\varphi$, here’s a picture of it:

So, given some $\varphi$ value, we want to specify how much to change our steering direction, $\dot{\varphi}$, as in the figure below:

If we’re on track to hit the object (i.e. $\varphi$ is near 0) then we steer away hard, and then make your change in direction less and less as the angle between your heading (velocity) and the object is larger and larger. Formally, define $\dot{\varphi}$ as

$\dot{\varphi} = \gamma \;\varphi \;\textrm{exp}(-\beta | \varphi |),$

where $\gamma$ and $\beta$ are constants, which are specified as $1000$ and $\frac{20}{\pi}$ in the paper, respectively.

Edit: OK this all sounds great, but quickly you run into issues here. The first is how do we calculate $\varphi$? In the paper I was going off of they used a dot product between the $\dot{\textbf{y}}$ vector and the $\textbf{o} - \textbf{y}$ vector to get the cosine of the angle, and then use np.arccos to get the angle from that. There is a big problem with this here, however, that’s kind of subtle: You will never get a negative angle when you calculate this, which, of course. That’s not how np.arccos works, but it is still what we need so we will be able to tell if we should be turning left or right to avoid this object!

So we need to use a different way of calculating the angle, one that tells us if the object is in a + or – angle relative to the way we’re headed. To calculate an angle that will give us + or -, we’re going to use the np.arctan2 function.

We want to center things around the way we’re headed, so first we calculate the angle of the direction vector, $\dot{\textbf{y}}$. Then we create a rotation vector, R_dy to rotate the vector describing the direction of the object. This shifts things around so that if we’re moving straight towards the object it’s at 0 degrees, if we’re headed directly away from it, it’s at 180 degrees, etc. Once we have that vector, nooowwww we can find the angle of the direction of the object and that’s what we’re going to use for phi. Huzzah!

# get the angle we're heading in
phi_dy = -np.arctan2(dy[1], dy[0])
R_dy = np.array([[np.cos(phi_dy), -np.sin(phi_dy)],
[np.sin(phi_dy), np.cos(phi_dy)]])
# calculate vector to object relative to body
obj_vec = obstacle - y
# rotate it by the direction we're going
obj_vec = np.dot(R_dy, obj_vec)
# calculate the angle of obj relative to the direction we're going
phi = np.arctan2(obj_vec[1], obj_vec[0])


This $\dot{\varphi}$ term can be thought of as a weighting, telling us how much we need to rotate based on how close we are to running into the object. To calculate how we should rotate we’re going to calculate the angle orthonormal to our current velocity, then weight it by the distance between the object and our current state on each axis. Formally, this is written:

$\textbf{R} \; \dot{\textbf{y}},$

where $\textbf{R}$ is the axis $(\textbf{o} - \textbf{y}) \times \dot{\textbf{y}}$ rotated 90 degrees (the $\times$ denoting outer product here). The way I’ve been thinking about this is basically taking your velocity vector, $\dot{\textbf{y}}$, and rotating it 90 degrees. Then we use this rotated vector as a row vector, and weight the top row by the distance between the object and the system along the $x$ axis, and the bottom row by the difference along the $\textbf{y}$ axis. So in the end we’re calculating the angle that rotates our velocity vector 90 degrees, weighted by distance to the object along each axis.

So that whole thing takes into account absolute distance to object along each axis, but that’s not quite enough. We also need to throw in $\dot{\varphi}$, which looks at the current angle. What this does is basically look at ‘hey are we going to hit this object?’, if you are on course then make a big turn and if you’re not then turn less or not at all. Phew.

OK so all in all this whole term is written out

$\textbf{p}(\textbf{y}, \dot{\textbf{y}}) = \textbf{R} \; \dot{\textbf{y}} \; \dot{\varphi},$

and that’s what we add in to the system acceleration. And now our DMP can avoid obstacles! How cool is that?

Super compact, straight-forward to add, here’s the code:

Edit: OK, so not suuuper compact. I’ve also added in another couple checks. The big one is “Is this obstacle between us and the target or not?”. So I calculate the Euclidean distance to the goal and the obstacle, and if the obstacle is further away then the goal, set the avoidance signal to zero (performed at the end of the if)! This took care of a few weird errors where you would get a big deviation in the trajectory if it saw an obstacle past the goal, because it was planning on avoiding it, but then was pulled back in to the goal before the obstacle anyways so it was a pointless exercise. The other check added in is just to make sure you only add in obstacle avoidance if the system is actually moving. Finally, I also changed the $\gamma = 100$ instead of $1000$.

def avoid_obstacles(y, dy, goal):
p = np.zeros(2)

for obstacle in obstacles:
# based on (Hoffmann, 2009)

# if we're moving
if np.linalg.norm(dy) > 1e-5:

# get the angle we're heading in
phi_dy = -np.arctan2(dy[1], dy[0])
R_dy = np.array([[np.cos(phi_dy), -np.sin(phi_dy)],
[np.sin(phi_dy), np.cos(phi_dy)]])
# calculate vector to object relative to body
obj_vec = obstacle - y
# rotate it by the direction we're going
obj_vec = np.dot(R_dy, obj_vec)
# calculate the angle of obj relative to the direction we're going
phi = np.arctan2(obj_vec[1], obj_vec[0])

dphi = gamma * phi * np.exp(-beta * abs(phi))
R = np.dot(R_halfpi, np.outer(obstacle - y, dy))
pval = -np.nan_to_num(np.dot(R, dy) * dphi)

# check to see if the distance to the obstacle is further than
# the distance to the target, if it is, ignore the obstacle
if np.linalg.norm(obj_vec) > np.linalg.norm(goal - y):
pval = 0

p += pval
return p


And that’s it! Just add this method in to your DMP system and call avoid_obstacles at every timestep, and add it in to your DMP acceleration.

You hopefully noticed in the code that this is set up for multiple obstacles, and that all that that entailed was simply adding the p value generated by each individual obstacle. It’s super easy! Here’s a very basic graphic showing how the DMP system can avoid obstacles:

So here there’s just a basic attractor system (DMP without a forcing function) trying to move from the center position to 8 targets around the unit circle (which are highlighted in red), and there are 4 obstacles that I’ve thrown onto the field (black x’s). As you can see, the system successfully steers way clear of the obstacles while moving towards the target!

We must all use this power wisely.

Edit: Making the power yours is now easier than ever! You can check out this code at my pydmps GitHub repo. Clone the repo and after you python setup.py develop, change directories into the examples folder and run the avoid_obstacles.py file. It will randomly generate 5 targets in the environment and perform 20 movements, giving you something looking like this:

## Dynamic movement primitives part 3: Rhythmic movements

So far we’ve looked at using DMPs for discrete movements, but as I briefly mentioned it’s also possible to use them to perform rhythmic movements. In this post we’re going to look at the implementation of rhythmic DMPs and see exactly how it’s done. It’s actually pretty straightforward, which is always nich. We’ll go through everything that has to change and then see a couple of different applications of rhythmic DMPs. Again, all the code for the DMPs and arm control implementations can be found up on my Github.

Obtaining consistent and repeatable basis function activation

We’ll start with the canonical system. In the discrete case our canonical system (which drives the activation of the basis functions) decayed from $1$ to $0$ and then we were all done. In the rhythmic case we want our system to repeat indefinitely, so we need a reliable way of continuously activating the basis functions in the same order. One function that may come to mind that reliably repeats is the cosine function. But how to use the cosine function, exactly?

The idea is that we’ll lay out our basis functions in the range from $0$ to $2\pi$, and we’ll set the canonical system to be ever increasing linearly. Then we’ll use the difference between the canonical system state and each of the center points as the value we pass in to our cosine function. Because the cosine function repeats at $2\pi$ we’ll get a nice repeating spread of basis function activations. We’ll then throw these values through our Gaussian equation (with the gain and bias terms) and we’ll be done! Here’s the equation:

$\psi = \textrm{exp}(h * \textrm{cos}(x - c) - 1),$
where $x$ is the state of the canonical system, $c$ is the basis function center point, and $h$ is a gain term controlling the variance of the Gaussian. And here’s a picture of the activations of three basis functions with centers spread out evenly between $0$ and $2\pi$:

Here’s a picture of the basis function activations on a longer time scale:

As long as our canonical system increases at a steady pace the basis functions will continue to activate in a reliable and repeatable way. As mentioned, to get the canonical system to increase at a steady pace we simply give it linear dynamics:

$\dot{x} = 1$
The placement of the center points of the rhythmic system is a lot easier than in the discrete case because the rhythmic canonical system dynamics are linear.

Other differences between discrete and rhythmic

In the actual implementation of the rhythmic system there are a couple of other differences that you need to be aware of. The first is that there is no diminishing term in the rhythmic system, whereas there is in the discrete case.

The second is how to go about establishing goal states when imitating an path. We’re going to assume that whatever path is passed in to us is going to be a rhythmic pattern, and then goal state is going to be set as the center point of the desired trajectory. So for rhythmic systems the ‘goal’ is actually more like a center point for the trajectory.

And that’s pretty much it! If you look through the code that I have up for the rhythmic DMP subclass you’ll see that only a handful of functions needed to be redefined, the rest is the same! Well that’s great, let’s look at some applications.

Example

I was thinking of different examples that would be interesting to see of the rhythmic DMPs, and after a little consideration the most obvious application of rhythmic DMPs is really to a system that does something like walking. In the spinal cord of animals there are circuits which are known as central pattern generators (CPGs), and when stimulated these have been shown to generate rhythmic, repeated movements (as described in Principles of rhythmic motor pattern generation and other papers). Let’s go ahead and build our own CPG here using DMPs. I don’t have a simulation of a set of legs, however, so instead we’ll look at getting a single 3-link arm to reproduce a pattern of behaviour that you’d expect in a leg that is walking.

To do this, I’m going to need to specify the trajectories for the three joints of the arm, and I’ll be controlling the system in joint space, using the generalized coordinates controller, rather than the operational space controller used in the discrete DMP examples for controlling the end-effector position. So that’s one thing. Another thing is that I’m going to need the kinematics of leg joints during motion. After a quick search of the web I found this presentation from Simon Fraser University which has the following images:

Which is some very useful information! I just need to transfer this into a Numpy vector that I can feed into the system and convert degrees to radians and we should be good to go! To get the information out of the graph, use your favorite data stripper, I used http://arohatgi.info/WebPlotDigitizer/app/.

I also had to do a bit of a shuffle with the trajectories provided to make it look right on the arm. The hip to shoulder joint maps fine, but for the knee to elbow the flexion and extension are reversed, and for the wrist to foot the $0$ angle for the foot is straight out. Once those have been accounted for, though, everything works great. Here is an animation of a single leg walking along, if you use your imagination you can just picture a happy-go-lucky torso attached to this leg whistling while he gets his very human-like gait on:

And there you go! It’s neat how straightforward that was, once you have the data of the system you want to model. It shows the power of DMPs for coordinating multiple degrees of freedom quickly and easily. Also something that’s worth pointing out once again is that this is a force based controller. We were able to go from a kinematic description of the movement we wanted to a reliable system generating torques. It’s another good example of the dexterity of dynamic movement primitives.

You can see the code for everything here up online on my github control repo on the walking_demo branch. To run the code you should run:

python run.py


## Dynamic movement primitives part 2: Controlling end-effector trajectories

The dynamic movement primitive (DMP) framework was designed for trajectory control. It so happens that in previous posts we’ve built up to having several arm simulations that are ripe for throwing a trajectory controller on top, and that’s what we’ll do in this post. The system that we will be controlling here is the 3 link arm model with an operational space controller (OSC) that translates end-effector forces into joint torques. The DMPs here will be controlling the $(x,y)$ trajectory of the hand, and the OSC will take care of turning the desired hand forces into torques that can be applied to the arm. All of the code used to generate the animations throughout this post can of course be found up on my github (to run play around with variants of python run.py arm3 dmp write).

Controlling a 3 link arm with DMPs
We have our 3 link arm and its OSC controller; this whole setup we’ll collectively refer to as ‘the plant’ throughout this post. We are going to pass in some $(x,y)$ force signal to the plant, and the plant will carry it out. Additionally we’ll get a feedback signal with the $(x,y)$ position of the hand. At the same time, we also have a DMP system that’s doing its own thing, tracing out a desired trajectory in $(x,y)$ space. We have to tie these two systems together.

To do this is easy, we’ll generate the control signal for the plant from our DMP system simply by measuring the difference between the state of our DMP system and the plant state, use that to drive the plant to the state of the DMP system. Formally,

$u = k_p(y_{\textrm{DMP}} - y)$

where $y_{\textrm{DMP}}$ is the state of the DMP system, $y$ is the state of the plant, and $k_p$ and is the position error gain term.

Once we have this, we just go ahead and step our DMP system forward and make sure the gain values on the control signal are high enough that the plant follows the DMP’s trajectory. And that’s pretty much it, just run the DMP system to the end of the trajectory and then stop your simulation.

To give a demonstration of DMP control I’ve set up the DMP system to follow the same number trajectories that the SPAUN arm followed. As you can see the combination of DMPs and operational space control is much more effective than my previous implementation.

Incorporating system feedback

One of the issues in implementing the control above is that we have to be careful about how quickly the DMP trajectory moves, because while the DMP system isn’t constrained by any physical dynamics, the plant is. Depending on the size of the movement the DMP trajectory may be moving a foot a second or an inch a second. You can see above that the arm doesn’t fully draw out the desired trajectories in places where the DMP system moved too quickly in and out and sharp corners. The only way to remedy this without feedback is to have the DMP system move more slowly throughout the entire trajectory. What would be nice, instead, would be to just say ‘go as fast as you can, as long as the plant state is within some threshold distance of you’, and this is where system feedback comes in.

It’s actually very straightforward to implement this using system feedback: If the plant state drifts away from the state of the DMPs, slow down the execution speed of the DMP to allow the plant time to catch up. The do this we just have to multiply the DMP timestep $dt$ by a new term:

$1 / (1 + \alpha_{\textrm{err}}(y_{\textrm{plant}} - y_{\textrm{DMP}}))$.

All this new term does is slow down the canonical system when there’s an error, you can think of it as a scaling on the time step. Additionally, the sensitivity of this term can be modulated the scaling term $\alpha_{\textrm{err}}$ on the difference between the plant and DMP states.

We can get an idea of how this affects the system by looking at the dynamics of the canonical system when an error term is introduced mid-run:

When the error is introduced the dynamics of the system slow down, great! Lets look at an example comparing execution with and without this feedback term. Here’s the system drawing the number 3 without any feedback incorporation:

and here’s the system drawing the number 3 with the feedback term included:

These two examples are a pretty good case for including the feedback term into your DMP system. You can still see in the second case that the specified trajectory isn’t traced out exactly, but if that’s what you’re shooting for you can just crank up the $\alpha_{\textrm{err}}$ to make the DMP timestep really slow down whenever the DMP gets ahead of the plant at all.

Interpolating trajectories for imitation

When imitating trajectories there can be some issues with having enough sample points and how to fit them to the canonical system’s timeframe, they’re not difficult to get around but I thought I would go over what I did here. The approach I took was to always run the canonical system for 1 second, and whenever a trajectory is passed in that should be imitated to scale the x-axis of the trajectory such that it’s between 0 and 1. Then I shove it into an interpolator and use the resulting function to generate an abundance of nicely spaced sample points for the DMP imitator to match. Here’s the code for that:

# generate function to interpolate the desired trajectory
import scipy.interpolate

path = np.zeros((self.dmps, self.timesteps))
x = np.linspace(0, self.cs.run_time, y_des.shape[1])

for d in range(self.dmps):

# create interpolation function
path_gen = scipy.interpolate.interp1d(x, y_des[d])

# create evenly spaced sample points of desired trajectory
for t in range(self.timesteps):
path[d, t] = path_gen(t * self.dt)

y_des = path


Direct trajectory control vs DMP based control

Now, using the above described interpolation function we can just directly use its output to guide our system. And, in fact, when we do this we get very precise control of the end-effector, more precise than the DMP control, as it happens. The reason for this is because our DMP system is approximating the desired trajectory and with a set of basis functions, and some accuracy is being lost in this approximation.

So if we instead use the interpolation function to drive the plant we can get exactly the points that we specified. The times when this comes up especially are when the trajectories that you’re trying to imitate are especially complicated. There are ways to address this with DMPs by placing your basis functions more appropriately, but if you’re just looking for the exact replication of an input trajectory (as often people are) this is a simpler way to go. You can see the execution of this in the trace.py code up on my github. Here’s a comparison of a single word drawn using the interpolation function:

and here’s the same word drawn using a DMP system with 1,000 basis function per DOF:

We can see that just using the interpolation function here gives us the exact path that we specified, where using DMPs we have some error, and this error increases with the size of the desired trajectory. But! That’s for exactly following a given trajectory, which is often not the case. The strength of the DMP framework is that the trajectory is a dynamical system. This lets us do simple things to get really neat performance, like scale the trajectory spatially on the fly simply by changing the goal, rather than rescaling the entire trajectory:

Conclusions

Some basic examples of using DMPs to control the end-effector trajectory of an arm with operational space control were gone over here, and you can see that they work really nicely together. I like when things build like this. We also saw that power of DMPs in this situation is in their generalizability, and not in exact reproduction of a given path. If I have a single complex trajectory that I only want the end-effector to follow once then I’m going to be better off just interpolating that trajectory and feeding the coordinates into the arm controller rather than go through the whole process of setting up the DMPs.

Drawing words, though, is just one basic example of using the DMP framework. It’s a very simple application and really doesn’t do justice to the flexibility and power of DMPs. Other example applications include things like playing ping pong. This is done by creating a desired trajectory showing the robot how to swing a ping pong paddle, and then using a vision system to track the current location of the incoming ping pong ball and changing the target of the movement to compensate dynamically. There’s also some really awesome stuff with object avoidance, that is implemented by adding another term with some simple dynamics to the DMP. Discussed here, basically you just have another system that moves you away from the object with a strength relative to your distance from the object. You can also use DMPs to control gain terms on your PD control signal, which is useful for things like object manipulation.

And of course I haven’t touched on rhythmic DMPs or learning with DMPs at all, and those are both also really interesting topics! But this serves as a decent introduction to the whole area, which has been developed in the Schaal lab over the last decade or so. I recommend further reading with some of these papers if you’re interested, there are a ton of neat ways to apply the DMP framework! And, again, the code for everything here is up on my github in the control and pydmps repositories.

## Dynamic movement primitives part 1: The basics

Dynamic movement primitives (DMPs) are a method of trajectory control / planning from Stefan Schaal’s lab. They were presented way back in 2002 in this paper, and then updated in 2013 by Auke Ijspeert in this paper. This work was motivated by the desire to find a way to represent complex motor actions that can be flexibly adjusted without manual parameter tuning or having to worry about instability.

Complex movements have long been thought to be composed of sets of primitive action ‘building blocks’ executed in sequence and \ or in parallel, and DMPs are a proposed mathematical formalization of these primitives. The difference between DMPs and previously proposed building blocks is that each DMP is a nonlinear dynamical system. The basic idea is that you take a dynamical system with well specified, stable behaviour and add another term that makes it follow some interesting trajectory as it goes about its business. There are two kinds of DMPs: discrete and rhythmic. For discrete movements the base system is a point attractor, and for rhythmic movements a limit cycle is used. In this post we’re only going to worry about discrete DMPs, because by the time we get through all the basics this will already be a long post.

Imagine that you have two systems: An imaginary system where you plan trajectories, and a real system where you carry them out. When you use a DMP what you’re doing is planning a trajectory for your real system to follow. A DMP has its own set of dynamics, and by setting up your DMP properly you can get the control signal for your actual system to follow. If our DMP system is planing a path for the hand to follow, then what gets sent to the real system is the set of forces that need to be applied to the hand. It’s up to the real system to take these hand forces and apply them, by converting them down to joint torques or muscle activations (through something like the operation space control framework) or whatever. That’s pretty much all I’ll say here about the real system, what we’re going to focus on here is the DMP system. But keep in mind that the whole DMP framework is for generating a trajectory \ control signal to guide the real system.

I’ve got the code for the basic discrete DMP setup and examples I work through in this post up on my github, so if you want to jump straight to that, there’s the link! You can run test code for each class just by executing that file.

Discrete DMPs

Let’s start out with point attractor dynamics:

$\ddot{y} = \alpha_y ( \beta_y (g - y) - \dot{y})$,

where $y$ is our system state, $g$ is the goal, and $\alpha$ and $\beta$ are gain terms. This should look very familiar, it’s a PD control signal, all this is going to do is draw our system to the target. Now what we’ll do is add on a forcing term that will let us modify this trajectory:

$\ddot{y} = \alpha_y ( \beta_y (g - y) - \dot{y}) + f$.

How to define a nonlinear function $f$ such that you get the desire behaviour is a non-trivial question. The crux of the DMP framework is an additional nonlinear system used to define the forcing function $f$ over time, giving the problem a well defined structure that can be solved in a straight-forward way and easily generalizes. The introduced system is called the canonical dynamical system, is denoted $x$, and has very simple dynamics:

$\dot{x} = -\alpha_x x$.

The forcing function $f$ is defined as a function of the canonical system:

$f(x,g) = \frac{\Sigma_{i=1}^N \psi_i w_i}{\Sigma_{i=1}^N \psi_i} x(g - y_0)$,

where $y_0$ is the initial position of the system,

$\psi_i = \textrm{exp}\left( -h_i \left( x - c_i\right)^2 \right)$,

and $w_i$ is a weighting for a given basis function $\psi_i$. You may recognize that the $\psi_i$ equation above defines a Gaussian centered at $c_i$, where $h_i$ is the variance. So our forcing function is a set of Gaussians that are ‘activated’ as the canonical system $x$ converges to its target. Their weighted summation is normalized, and then multiplied by the $x (g - y_0)$ term, which is both a ‘diminishing’ and spatial scaling term.

Let’s break this down a bit. The canonical system starts at some arbitrary value, throughout this post $x_0 = 1$, and goes to 0 as time goes to infinity. For right now, let’s pretend that $x$ decays linearly to $0$. The first main point is that there are some basis functions which are activated as a function of $x$, this is displayed in the top figure below. As the value of $x$ decreases from 1 to 0, each of the Gaussians are activated (or centered) around different $x$ values. The second thing is that each of these basis functions are also assigned a weight, $w_i$. These weights are displayed in the lower figure in the bar plot. The output of the forcing function $f$ is then the summation of the activations of these basis functions multiplied by their weight, also displayed in the lower figure below.

The diminishing term
Incorporating the $x$ term into the forcing function guarantees that the contribution of the forcing term goes to zero over time, as the canonical system does. This means that we can sleep easy at night knowing that our system can trace out some crazy path, and regardless will eventually return to its simpler point attractor dynamics and converge to the target.

Spatial scaling
Spatial scaling means that once we’ve set up the system to follow a desired trajectory to a specific goal we would like to be able to move that goal farther away or closer in and get a scaled version of our trajectory. This is what the $(g - y_0)$ term of the forcing function handles, by scaling the activation of each of these basis functions to be relative to the distance to the target, causing the system to cover more or less distance. For example, let’s say that we have a set of discrete DMPs set up to follow a given trajectory:

The goals in this case are 1 and .5, which you can see is where the DMPs end up. Now, we’ve specified everything in this case for these particular goals (1 and .5), but let’s say we’d like to now generalize and get a scaled up version of this trajectory for moving by DMPs to a goal of 2. If we don’t appropriately scale our forcing function, with the $(g - y_0)$ term, then we end up with this:

Basically what’s happened is that for these new goals the same weightings of the basis functions were too weak to get the system to follow or desired trajectory. Once the $(g - y_0)$ term included in the forcing function, however, we get:

which is exactly what we want! Our movements now scale spatially. Awesome.

Alright, now, unfortunately for us, our canonical system does not converge linearly to the target, as we assumed above. Here’s a comparison of a linear decay vs the exponential decay of actual system:

This is an issue because our basis functions activate dependent on $x$. If the system was linear then we would be fine and the basis function activations would be well spread out as the system converged to the target. But, with the actual dynamics, $x$ is not a linear function of time. When we plot the basis function activations as a function of time, we see that the majority are activated immediately as $x$ moves quickly at the beginning, and then the activations stretch out as the $x$ slows down at the end:

In the interest of having the basis functions spaced out more evenly through time (so that our forcing function can still move the system along interesting paths as it nears the target, we need to choose our Gaussian center points more shrewdly. If we look at the values of $x$ over time, we can choose the times that we want the Gaussians to be activated, and then work backwards to find the corresponding $x$ value that will give us activation at that time. So, let’s look at a picture:

The red dots are the times we’d like the Gaussians to be centered around, and the blue line is our canonical system $x$. Following the dotted lines up to the corresponding $x$ values we see what values of $x$ the Gaussians need to be centered around. Additionally, we need to worry a bit about the width of each of the Gaussians, because those activated later will be activated for longer periods of time. To even it out the later basis function widths should be smaller. Through the very nonanalytical method of trial and error I’ve come to calculate the variance as

$h_i = \frac{\#BFs}{c_i}.$

Which reads the variance of basis function $i$ is equal to the number of basis functions divided by the center of that basis function. When we do this, we can now generate centers for our basis functions that are well spaced out:

Temporal scaling

Again, generalizability is one of the really important things that we want out of this system. There are two obvious kinds, temporal and spatial. Spatial scaling we discussed above, in the temporal case we’d like to be able to follow this same trajectory at different speeds. Sometimes quick, sometimes slow, but always tracing out the same path. To do that we’re going to add another term to our system dynamics, $\tau$, our temporal scaling term. Given that our system dynamics are:

$\ddot{y} = \alpha_y ( \beta_y (g - y) - \dot{y}) + f$,
$\dot{x} = -\alpha_x x$,

to give us temporal flexibility we can add the $\tau$ term:

$\ddot{y} = \tau^2 (\alpha_y ( \beta_y (g - y) - \dot{y}) + f)$,
$\dot{x} = \tau(-\alpha_x x)$,

where we use $\tau^2$ for $\ddot{y}$ because it’s the second derivative, and that’s all we have to do! Now to slow down the system you set $\tau$ between 0 and 1, and to speed it up you set $\tau$ greater than 1.

Imitating a desired path

Alright, great. We have a forcing term that can make the system take a weird path as it converges to a target point, and temporal and spatial scalability. How do we set up the system to follow a path that we specify? That would be ideal, to show the system the path to follow, and have it be able to work backwards and figure out the forces and then be able to generate that trajectory whenever we want. This ends up being a pretty straight forward process.

We have control over the forcing term, which affects the system acceleration. So we first need to take our desired trajectory, $\textbf{y}_d$ (where bold denotes a vector, in this case the time series of desired points in the trajectory), and differentiate it twice to get the accelerations:

$\ddot{\textbf{y}}_d = \frac{\partial}{\partial t} \dot{\textbf{y}}_d = \frac{\partial}{\partial t} \frac{\partial}{\partial t} \textbf{y}_d$.

Once we have the desired acceleration trajectory, we need to remove the effect of the base point attractor system. We have the equation above for exactly what the acceleration induced by the point attractor system at each point in time is:

$\ddot{y} = \alpha_y ( \beta_y (g - y) - \dot{y})$,

so then to calculate what the forcing term needs to be generate this trajectory we have:

$\textbf{f}_d = \ddot{\textbf{y}}_d - \alpha_y ( \beta_y (g - \textbf{y}) - \dot{\textbf{y}})$.

From here we know that the forcing term is comprised of a weighted summation of basis functions which are activated through time, so we can use an optimization technique like locally weighted regression to choose the weights over our basis functions such that the forcing function matches the desired trajectory $\textbf{f}_d$. In locally weighted regression sets up to minimize:

$\Sigma_t \psi_i(t)(f_d(t) - w_i (x(t) (g - y_0)))^2$

and the solution (which I won’t derive here, but is worked through in Schaal’s 1998 paper) is

$w_i = \frac{\textbf{s}^T \pmb{\psi}_i \textbf{f}_d}{\textbf{s}^T \pmb{\psi}_i \textbf{s}}$,

where

$\textbf{s} = \left( \begin{array}{c}x_{t_0}(g - y_0) \\ \vdots \\ x_{t_N}(g - y_0) \end{array} \right), \;\;\; \pmb{\psi}_i = \left( \begin{array}{ccc} \psi_i(t_0) & \dots & 0 \\ 0 & \ddots & 0 \\ 0 & \dots & \psi_i(t_n) \end{array} \right)$

Great! Now we have everything we need to start making some basic discrete DMPs!

Different numbers of basis functions

One of the things you’ll notice right off the bat when imitating paths, is that as the complexity of the trajectory increases, so does the required number of basis functions. For example, below, the system is trying to follow a sine wave and a highly nonlinear piecewise function:

We can see in the second case that although the DMP is never able to exactly reproduce the desired trajectory, the approximation continues to get better as the number of basis functions increases. This kind of slow improvement in certain nonlinear areas is to be expected from how the basis functions are being placed. An even spreading of the centers of the basis functions through time was used, but for imitation there is another method out of Dr. Schaal’s lab that places the basis functions more strategically. Need is determined by the function complexity is in that region, and basis function centers and widths are defined accordingly. In highly nonlinear areas we would expect there to be many narrow basis functions, and in linear areas we would expect fewer basis functions, but ones that are wider. The method is called locally weighted projection regression, which I plan on writing about and applying in a future post!

Conclusions \ thoughts

There’s really a lot of power in this framework, and there are a ton of expansions on this basic setup, including things like incorporating system feedback, spatio-temporal coupling of DMPs, using DMPs for gain control as well as trajectory control, incorporating a cost function and reinforcement learning, identifying action types, and other really exciting stuff.

I deviated from the terminology used in the papers here in a couple of places. First, I didn’t see a reason to reduce the second order systems to two first order systems. When working through it I found it more confusing than helpful, so I left the dynamics as a second order systems. Second, I also moved the $\tau$ term to the right hand side, and that’s just so that it matches the code, it doesn’t really matter. Neither of these were big changes, but in case you’re reading the papers and wondering.

Something that I kind of skirted above is planning along multiple dimensions. It’s actually very simple; the DMP framework simply assigns one DMP per degree of freedom being controlled. But, it’s definitely worth explicitly stating at some point.

I also mentioned this above, but this is a great trajectory control system to throw on top of the previously discussed operational space control framework. With the DMP framework on top to plan robust, generalizable movements, and the OSCs underneath to carry out those commands we can start to get some really neat applications. For use on real systems the incorporation of feedback and spatio-temporal coupling terms is going to be important, so the next post will likely be working through those and then we can start looking at some exciting implementations!

Speaking of implementations, there’s a DMP and canonical system code up on my github, please feel free to explore it, run it, send me questions about it. Whatever. I should also mention that there’s this and a lot more all up and implemented on Stefan Schaal’s lab website.