Edit: Previously I posted this blog post on incorporating obstacle avoidance, but after a recent comment on the code I started going through it again and found a whole bunch of issues. Enough so that I’ve recoded things and I’m going to repost it now with working examples and a description of the changes I made to get it going. The edited sections will be separated out with these nice horizontal lines. If you’re just looking for the example code, you can find it up on my `pydmps`

repo, here.

Alright. Previously I’d mentioned in one of these posts that DMPs are awesome for generalization and extension, and one of the ways that they can be extended is by incorporating obstacle avoidance dynamics. Recently I wanted to implement these dynamics, and after a bit of finagling I got it working, and so that’s going to be the subject of this post.

There are a few papers that talk about this, but the one we’re going to use is Biologically-inspired dynamical systems for movement generation: automatic real-time goal adaptation and obstacle avoidance by Hoffmann and others from Stefan Schaal’s lab. This is actually the second paper talking about obstacle avoidance and DMPs, and this is a good chance to stress one of the most important rules of implementing algorithms discussed in papers: collect at least 2-3 papers detailing the algorithm (if possible) before attempting to implement it. There are several reasons for this, the first and most important is that there are likely some typos in the equations of one paper, by comparing across a few papers it’s easier to identify trickier parts, after which thinking through what the correct form should be is usually straightforward. Secondly, often equations are updated with simplified notation or dynamics in later papers, and you can save yourself a lot of headaches in trying to understand them just by reading a later iteration. I recklessly disregarded this advice and started implementation using a single, earlier paper which had a few key typos in the equations and spent a lot of time tracking down the problem. This is just a peril inherent in any paper that doesn’t provide tested code, which is almost all, sadface.

OK, now on to the dynamics. Fortunately, I can just reference the previous posts on DMPs here and don’t have to spend any time discussing how we arrive at the DMP dynamics (for a 2D system):

where and are gain terms, is the goal state, is the system state, is the system velocity, and is the forcing function.

As mentioned, DMPs are awesome because now to add obstacle avoidance all we have to do is add another term

where implements the obstacle avoidance dynamics, and is a function of the DMP state and velocity. Now then, the question is what are these dynamics exactly?

**Obstacle avoidance dynamics**

It turns out that there is a paper by Fajen and Warren that details an algorithm that mimics human obstacle avoidance. The idea is that you calculate the angle between your current velocity and the direction to the obstacle, and then turn away from the obstacle. The angle between current velocity and direction to the obstacle is referred to as the steering angle, denoted , here’s a picture of it:

So, given some value, we want to specify how much to change our steering direction, , as in the figure below:

If we’re on track to hit the object (i.e. is near 0) then we steer away hard, and then make your change in direction less and less as the angle between your heading (velocity) and the object is larger and larger. Formally, define as

where and are constants, which are specified as and in the paper, respectively.

Edit: OK this all sounds great, but quickly you run into issues here. The first is how do we calculate ? In the paper I was going off of they used a dot product between the vector and the vector to get the cosine of the angle, and then use `np.arccos`

to get the angle from that. There is a big problem with this here, however, that’s kind of subtle: You will never get a negative angle when you calculate this, which, of course. That’s not how `np.arccos`

works, but it is still what we need so we will be able to tell if we should be turning left or right to avoid this object!

So we need to use a different way of calculating the angle, one that tells us if the object is in a + or – angle relative to the way we’re headed. To calculate an angle that will give us + or -, we’re going to use the `np.arctan2`

function.

We want to center things around the way we’re headed, so first we calculate the angle of the direction vector, . Then we create a rotation vector, `R_dy`

to rotate the vector describing the direction of the object. This shifts things around so that if we’re moving straight towards the object it’s at 0 degrees, if we’re headed directly away from it, it’s at 180 degrees, etc. Once we have that vector, nooowwww we can find the angle of the direction of the object and that’s what we’re going to use for phi. Huzzah!

# get the angle we're heading in phi_dy = -np.arctan2(dy[1], dy[0]) R_dy = np.array([[np.cos(phi_dy), -np.sin(phi_dy)], [np.sin(phi_dy), np.cos(phi_dy)]]) # calculate vector to object relative to body obj_vec = obstacle - y # rotate it by the direction we're going obj_vec = np.dot(R_dy, obj_vec) # calculate the angle of obj relative to the direction we're going phi = np.arctan2(obj_vec[1], obj_vec[0])

This term can be thought of as a weighting, telling us how much we need to rotate based on how close we are to running into the object. To calculate *how* we should rotate we’re going to calculate the angle orthonormal to our current velocity, then weight it by the distance between the object and our current state on each axis. Formally, this is written:

where is the axis rotated 90 degrees (the denoting outer product here). The way I’ve been thinking about this is basically taking your velocity vector, , and rotating it 90 degrees. Then we use this rotated vector as a row vector, and weight the top row by the distance between the object and the system along the axis, and the bottom row by the difference along the axis. So in the end we’re calculating the angle that rotates our velocity vector 90 degrees, weighted by distance to the object along each axis.

So *that* whole thing takes into account absolute distance to object along each axis, but that’s not quite enough. We also need to throw in , which looks at the current angle. What this does is basically look at ‘hey are we going to hit this object?’, if you are on course then make a big turn and if you’re not then turn less or not at all. Phew.

OK so all in all this whole term is written out

and that’s what we add in to the system acceleration. And now our DMP can avoid obstacles! How cool is that?

Super compact, straight-forward to add, here’s the code:

Edit: OK, so not suuuper compact. I’ve also added in another couple checks. The big one is “Is this obstacle between us and the target or not?”. So I calculate the Euclidean distance to the goal and the obstacle, and if the obstacle is further away then the goal, set the avoidance signal to zero (performed at the end of the `if`

)! This took care of a few weird errors where you would get a big deviation in the trajectory if it saw an obstacle past the goal, because it was planning on avoiding it, but then was pulled back in to the goal before the obstacle anyways so it was a pointless exercise. The other check added in is just to make sure you only add in obstacle avoidance if the system is actually moving. Finally, I also changed the instead of .

def avoid_obstacles(y, dy, goal): p = np.zeros(2) for obstacle in obstacles: # based on (Hoffmann, 2009) # if we're moving if np.linalg.norm(dy) > 1e-5: # get the angle we're heading in phi_dy = -np.arctan2(dy[1], dy[0]) R_dy = np.array([[np.cos(phi_dy), -np.sin(phi_dy)], [np.sin(phi_dy), np.cos(phi_dy)]]) # calculate vector to object relative to body obj_vec = obstacle - y # rotate it by the direction we're going obj_vec = np.dot(R_dy, obj_vec) # calculate the angle of obj relative to the direction we're going phi = np.arctan2(obj_vec[1], obj_vec[0]) dphi = gamma * phi * np.exp(-beta * abs(phi)) R = np.dot(R_halfpi, np.outer(obstacle - y, dy)) pval = -np.nan_to_num(np.dot(R, dy) * dphi) # check to see if the distance to the obstacle is further than # the distance to the target, if it is, ignore the obstacle if np.linalg.norm(obj_vec) > np.linalg.norm(goal - y): pval = 0 p += pval return p

And that’s it! Just add this method in to your DMP system and call `avoid_obstacles`

at every timestep, and add it in to your DMP acceleration.

You hopefully noticed in the code that this is set up for *multiple* obstacles, and that all that that entailed was simply adding the `p`

value generated by each individual obstacle. It’s super easy! Here’s a very basic graphic showing how the DMP system can avoid obstacles:

So here there’s just a basic attractor system (DMP without a forcing function) trying to move from the center position to 8 targets around the unit circle (which are highlighted in red), and there are 4 obstacles that I’ve thrown onto the field (black x’s). As you can see, the system successfully steers way clear of the obstacles while moving towards the target!

We must all use this power wisely.

Edit: Making the power yours is now easier than ever! You can check out this code at my `pydmps`

GitHub repo. Clone the repo and after you `python setup.py develop`

, change directories into the examples folder and run the `avoid_obstacles.py`

file. It will randomly generate 5 targets in the environment and perform 20 movements, giving you something looking like this:

[…] different frequency that the arm needs to avoid, using an obstacle avoidance algorithm like the one in this post, add in another dimension of the task involving the gripper, implement a null-space controller to […]

Hi Travis,

thanks for your posts on DMPs, I really enjoyed reading them!

First a minor comment: In the post, you denote the steering angle by \varphi in the plots, but then you use \psi in the text. Probably it would be better to stick to \varphi since you were previously using \psi for the basis functions.

Second, a theoretical question: if I understand correctly the obstacle avoidance presented here is purely reactive. Is the resulting DMP prone to getting stuck in local minima when the target forcing term and the obstacle avoidance term contradict each other, and possibly cancel each other out?

Cheers,

Sebastian

Hi, thanks for the suggestion! It looks much better now.

Glad you’ve enjoyed the posts!

And yes, you’re correct it’s super possible to get stuck because of opposing forces from different obstacles. It’s definitely not a bullet-proof solution!

Hello Travis!

I am very grateful for all the content, it has been really helpful to understand the possibilities and inner details of DMPs. I am trying to implement obstacle avoidance with DMPs, but I have a major doubt. Since the vectors (o-x) and (v) are in the same plane, the resulting R vector is perpendicular to both (o-x) and (v). Therefore when calculating R v phi_dot, R and v are perpendicular and their product is zero, so the forcing term is always zero. I know I am missing something, but I just cannot figure out what. Thanks a lot in advance!!

Hi Sara, I had the chance to fix a few pretty significant issues with the post, and put up links to the code. If you could check out the code examples and updates, then let me know if that helps to clear things up that would be great! Thanks!

Hello,

It was really a best tutorial for DMP system. I have one question, can we do a DMP system for exoskeleton robotics in compliance with the Human user. If yes, how to modify the code with lower limb exoskeleton to find it’s path planning or trajectories.

Hi Saravana,

thank you! Glad that you found it useful! It’s hard to say without knowing more details about your project; if you send me an email I’d be happy to talk about it more and see if we can figure something out!

Hello,

I’m electrical engineering student trying to understand and work in robotics, I’ve been working with C# programming on a Kuka Manipulator (KR16) and my plan is to implement Dmps, so I’ll understand your code, then translate, but it’s my first time on Python. I’m trying to install your code here, but I got a problem when I run Avoid_obstacles:

import pydmps.dmp_discrete

File “C:\Users\joaov.DESKTOP-FH0M6OC\AppData\Local\Programs\Python\Python35-32\lib\site-packages\pydmps-0.1-py3.5.egg\pydmps\dmp_discrete.py”, line 18, in

from dmp import DMPs

ImportError: No module named ‘dmp’

I installed all the modules, don’t know what’s happening.

Another question: what’s the configuration of Obstacle position, Y position and Dy position? I mean is it a Colum like [Dy.AxisX Dy.AxisY] or [Dy.AxisX(k) Dy.AxisX(k-1)]?

Thanks, your tutorial is really awesome, congratulations from Brazil

Hello Joao! Sounds like a neat project!

It looks like you haven’t installed the pydmps module itself, did you ‘pip install pydmps’? Or if you’ve cloned the repo you can go into the main folder and run ‘python setup.py develop’ should do the trick!

I’m afraid I don’t understand your second question, could you rephrase it for me?

Cheers!

I am reading your posts about these robot control things.

Thanks for your excellent tutorials.

Recently, I am interested exactly in this avoid-obstacle robotics.

I am trying to make some demostrations of obstacle avoidance using the GPS(Guided Policy Search, which maybe you mentioned in a tutorial), by just adding some cost fuction according to the distance between robot hand and the obstacle.

Often, it does make sence, and I can get some wonderful intended motion for avoiding obstacles, of the GPS PR2 7 DOF robot arm model, or some other 6 DOF robot arm, in a MUJOCO simulator.

But I still doubt if these methods can really find the best solution(trajectory) for a certain task.

I am not familar with optimal control theory and those terms, so i don’t know my understanding is correct or wrong.

Can I say:

These methods(GPS using iLQG, or DMP) are based on optimal control theory(computation), not based on something like reinforcement learning(searching).

So they use the gradient(or partial differential, or something like the jacobian), to just decide a direction to optimize the cost. If the costs are not well designed, these methods may be trapped in local minima. A better way is to try some reinforcement learning, but usually more time is needed.

Hi Liu,

the DMP obstacle avoidance method that I’m using here will definitely get stuck in local minima, it’s just a heuristic approach to the problem that works well enough in most cases. If you’re using iLQG, though, you’ll get much more robust solutions. It’s still possible to fall into local minima, but it will actually search the solution space for a good control signal, whereas the DMP method does not. There are learning methods like PIPI for DMPs that do incorporate learning and cost functions for DMPs. RL is subject to the same kinds of problems, all of these solutions are sensitive to the initial trajectory when you move to high-dimensional state spaces.

Also, you can say that iLQG is based on classic optimal control theory, and that the neural network training part of GPS, DMP learning, and RL are all performing optimization (using some variation of gradient descent to minimize the cost function) which can be grouped under the very broad ‘optimal control theory’ umbrella.

Hope that helps!

I would like to add something to what Travis has said. The convergence of RL also depends on the when you get the reward signal. Tasks like manipulation and grasping (where you have a delayed reward signal), it is very hard to avoid an RL system from not getting stuck in a local minimum. So in these cases, an approach used is to bootstrap your system from a demonstration.

The methods like GPS uses iLQR or PI2 to find policy distributions which could be later trained on a neural network using supervised learning approach. When you have several policies with varied starting conditions, sampling from them and training the neural network helps to make a more robust stable global policy approximation. So in a way, we could approximate RL using trajectory optimisation + supervised learning. Trajectory optimisation is essentially a greedy search performed based on the cost function given.

Ah that’s a good point, you can be more assured of not being stuck in a local minima by varying the initial conditions of the algorithm during search. Thanks for the comment!

For clarification, by ‘approximating RL’ are you referring to approximating a system that’s found the global minimum?

Oh yes definitely, though that is a slight abuse of the phrase. Even this training from multiple start locations is limited to a certain region of the state-space (but still it is better than single trajectory optimisation) as you can see in the recent papers. That is why I said approximating RL. If it was proper RL, in my understanding this wouldn’t have happened.. right? since you know what do from any part of state-space to reach your goal.

Ah I see! I believe that while there can be a system trained well with RL that would be able to move optimally from any part of the state space, that’s specific to the RL algorithm that’s used (e.g. Q-learning, SARSA, etc), and isn’t a guarantee of RL techniques in general. Being classified as RL really is more about the problem type (feedback timing, non-specific reward, minimal knowledge of environment) than any comment on the algorithm used in solving things. Does that sound fair?

Yes. I do agree.

😀

Hi Travis,

As usual, great tutorial!. I do have a doubt. When you multiply [R_half_pi (O-Y) x dY ]dY, we get is this:

[[0, -1],[1, 0]] * [[(Ox-Yx) dYx, (Ox-Yx) dYy],[(Oy-Yy) dYx, (Oy-Yy) dYy]]*[[dYx],[dYy]]

=

[[-(Oy-Yy)*dYx^2], [-(Ox-Yx)*dYy^2] ]

So that means we are weighting using the velocities along each direction rather than distance. To weight w.r.t to the distance then shouldn’t we do this?

[R_half_pi (O-Y) x dY ]*(O-Y)?

best,

mike

Hi Mike,

Interesting point! This was definitely me working to find an intuition behind things, and not the underlying motivation of the algorithm the authors discussed. I think that R can still be thought of as a 90 degree rotation from the current direction scaled by distance to the obstacle, but then it does become something else when we multiply by dY again. I believe you’ll still want to use dY rather than (O-Y) in the second part because you want to alter the movement based on where the obstacle is going rather than where it happens to be (applying more force if it’s heading straight at the object, less if it’s already moving away).

Thanks for the comment!

[…] discussed how to avoid obstacles using DMPs in the end-effector trajectory. This is good when you’re controlling a single disconnected […]

Hello Travis!

Thank you for your tutorials, they are awesome and very helpful. But I had some problems in understanding the R matrix.

According to your code, the way you computed the R matrix is:

1. Outer Product:

[(O_x – X_x) ; (O_y – X_y)] X [V_x; V_y] = [(O_x – X_x)V_x, (O_x – X_x)V_y; (O_y – X_y)*V_x, (O_y – X_y)*V_y]

2. Rotated 90 degree

[0, -1; 1, 0][result from 1] = [-(O_y – X_y)*V_y, (O_x – X_x)V_x; -(O_y – X_y)*V_y, (O_y – X_y)*V_x]

This just don’t look like what you explain in the post. I know I might misunderstanding something, but I just don’t know what. Please help me with that. Many thanks in advanced!

Hello! Thanks, glad they’ve been helpful to you! 🙂

I mention (kind of hidden in a paragraph) that R is (O_x – Y_x) x dY rotated 90 degrees. And now that i’m looking at it I have (o – x) instead of (o – y), i’ll fix that! Does that help? If you could highlight the part that’s unclear I can try to rephrase it better!