So, first, we have to be careful with x and q. x and dx are the operational space variables, and each are 3×1 (assuming we’re controlling inside 3D Cartesian space, note that the examples above are controlling in 2D space though). So if we’re controlling a 7DOF arm in 3D space, then q and dq are both 7×1, Mq is the inertia matrix in joint space, so it’ll be 7×7, the torque command u must also be 7×1, and the Jacobian that’s 3×7 will be used to transform from 3D operational space to 7D joint space. Does that make help? It looks like you might have confused the dimensionality of the operational space and Jacobian.

arm.dq is actually the angular velocity of each of the arm’s joints. The target velocity in this point-to-point control system is always zero, so rather than writing `+ kv * (target_dq - arm.dq)`

we just write `- kv * arm.dq`

.

Does that clear things up?

]]>For clarification, by ‘approximating RL’ are you referring to approximating a system that’s found the global minimum?

]]>The methods like GPS uses iLQR or PI2 to find policy distributions which could be later trained on a neural network using supervised learning approach. When you have several policies with varied starting conditions, sampling from them and training the neural network helps to make a more robust stable global policy approximation. So in a way, we could approximate RL using trajectory optimisation + supervised learning. Trajectory optimisation is essentially a greedy search performed based on the cost function given.

]]>the DMP obstacle avoidance method that I’m using here will definitely get stuck in local minima, it’s just a heuristic approach to the problem that works well enough in most cases. If you’re using iLQG, though, you’ll get much more robust solutions. It’s still possible to fall into local minima, but it will actually search the solution space for a good control signal, whereas the DMP method does not. There are learning methods like PIPI for DMPs that do incorporate learning and cost functions for DMPs. RL is subject to the same kinds of problems, all of these solutions are sensitive to the initial trajectory when you move to high-dimensional state spaces.

Also, you can say that iLQG is based on classic optimal control theory, and that the neural network training part of GPS, DMP learning, and RL are all performing optimization (using some variation of gradient descent to minimize the cost function) which can be grouped under the very broad ‘optimal control theory’ umbrella.

Hope that helps!

to calculate the error between current and desired orientation and off you go! ]]>

Thanks!

]]>