Linear-Quadratic Regulation for non-linear systems using finite differences

One of the standard controllers in basic control theory is the linear-quadratic regulator (LQR). There is a finite-horizon case (where you have a limited amount of time), and an infinite-horizon case (where you don’t); in this post, for simplicity, we’re only going to be dealing with the infinite-horizon case.

The LQR is designed to handle a very specific kind of problem. First, it assumes you are controlling a system with linear dynamics, which means you can express them as

\dot{\textbf{x}} = \textbf{A}\textbf{x} + \textbf{B}\textbf{u},

where \textbf{x} and \dot{\textbf{x}} are the state and its time derivative, \textbf{u} is the input, and \textbf{A} and \textbf{B} capture the effects of the state and input on the derivative. And second, it assumes that the cost function, denoted J, is a quadratic of the form

J = \int_0^{\infty} \left( (\textbf{x} - \textbf{x}^*)^T \textbf{Q} (\textbf{x} - \textbf{x}^*) + \textbf{u}^T \textbf{R} \textbf{u} \right) dt

where \textbf{x}^* is the target state, and \textbf{Q} = \textbf{Q}^T \geq 0 and \textbf{R} = \textbf{R}^T \geq 0 are weights on the cost of not being at the target state and applying a control signal. The higher \textbf{Q} is, the more important it is to get to the target state asap, the higher \textbf{R} is, the more important it is to keep the control signal small as you go to the target state.

The goal of the LQR is to calculate a feedback gain matrix \textbf{K} such that

\textbf{u} = -\textbf{K} \textbf{x},

drives the system to the target. When the system is a linear system with a quadratic cost function, this can be done optimally. There is lots of discussion elsewhere about LQRs and their derivation, so I’m not going to go into that with this post. Instead, I’m going to talk about applying LQRs to non-linear systems, and using finite differences to do it, which works when you have a readily accessible simulation of the system on hand. The fun part is that by using finite differences you can get this to work without working out the dynamics equations yourself.

Using LQRs on non-linear systems

As you may have noticed, non-linear systems violate the first assumption of a linear quadratic regulator; that the system is linear. That doesn’t mean that we can’t apply it, it just means that it’s not going to be optimal. How poorly the LQR will perform depends on a few things, two important factors being how non-linear the system dynamics actually are, and how often you’re able to update the feedback gain matrix \textbf{K}. To apply LQR to non-linear systems we’re just going to close our eyes and pretend that the system dynamics are linear, i.e. they fit the form

\dot{\textbf{x}} = \textbf{A}\textbf{x} + \textbf{B}\textbf{u}.

We’ll do this by approximating the actual dynamics of the system linearly. We’ll then solve for our gain value \textbf{K}, generate our control signal for this timestep, and then re-approximate the dynamics again at the next time step and solve for \textbf{K} from the new state. The more non-linear the system dynamics are, the less appropriate \textbf{K} will be for generating our control signal \textbf{u} as we move away from the state \textbf{K} was calculated in; this is why update time of the LQR can become an important factor.

Using finite-differences to approximate system dynamics

An important question, then, is how do we find this system approximation? How can we calculate the \textbf{A} and \textbf{B} matrices that we then use to solve for \textbf{K}? If we know the dynamics of the system to be

\dot{\textbf{x}} = f(\textbf{x}, \textbf{u}),

then we can calculate

\textbf{A} = \frac{\partial f(\textbf{x}, \textbf{u})}{\partial \textbf{x}}, \;\;\;\; \textbf{B} = \frac{\partial f(\textbf{x}, \textbf{u})}{\partial \textbf{u}}.

If you’re going to try this for the 3-link arm, though, get out Mathematica. Do not try this by hand. If you disregard my warning and foolhardily attempt such a derivation you will regret, repent, and then appeal to Wolfram Alpha for salvation. These equations quickly become terrible and long even for seemingly not-so-complicated systems.

There are a few ways to skirt this. Here we’re going to assume that the system under control is a simulation, or that we at least have access to an accurate model, and use the finite differences method to compute these values. The idea behind finite differences is to approximate the rate of change of the function f at the point x by sampling f near x and using the difference to calculate \dot{f}(x). Here’s a picture for a 1D system:

finitedifferences
So here, our current state x is the blue dot, and the red dots represent the sample points x + \Delta x and x - \Delta x. We can then calculate

\dot{f}(x) \approx \frac{f(x+\Delta x) - f(x-\Delta x)}{2\Delta x},

and you can see the actual rate of change of f at x plotted in the blue dashed line, and the approximated rate of change calculated using finite differences plotted in the red dashed line. We can also see that the approximated derivative is only accurate near x (the blue dot).

Back in our multi-dimensional system, to use finite differences to calculate the derivative with respect to the state and the input we’re going to vary each of the dimensions of the state and input by some small amount one at a time, calculating the effects of each one by one. Here’s a chunk of pseudo-code to hopefully clarify this idea:

eps = 1e-5
A = np.zeros((len(current_state), len(current_state))
for ii in range(len(current_state)):
    x = current_state.copy()
    x[ii] += eps
    x_inc = simulate_system(state=x, input=control_signal)
    x = current_state.copy()
    x[ii] -= eps
    x_dec = simulate_system(state=x, input=control_signal)
    A[:,ii] = (x_inc - x_dec) / (2 * eps)

B = np.zeros((len(current_state), len(control_signal))
for ii in range(len(control_signal)):
    u = control_signal.copy()
    u[ii] += eps
    x_inc = simulate_system(state=current_state, input=u)
    u = control_signal.copy()
    u[ii] -= eps
    x_dec = simulate_system(state=current_state, input=u)
    B[:,ii] = (x_inc - x_dec) / (2 * eps)

Now we’re able to generate our \textbf{A} and \textbf{B} matrices we have everything we need to solve for our feedback gain matrix \textbf{K}! Which is great.

Note on using finite differences in continuous vs discrete setup

Something that’s important to straighten out too is what exactly is returned by the simulate_system function in the code above. In the continuous case, your system is captured as

\dot{\textbf{x}} = \textbf{A}\textbf{x} + \textbf{B}\textbf{u},,

where in the discrete case your system is defined

\textbf{x}(t+1) = \textbf{A}\textbf{x}(t) + \textbf{B}\textbf{u}(t).

If you are calculating your feedback gain matrix \textbf{K} using the continuous solution to the algebraic Riccati equation, then you need to be returning \dot{\textbf{x}}(t). If you’re solving for \textbf{K} using the discrete solution to the algebraic Riccati equation you need to return \textbf{x}(t+1). This was just something that I came across as I was coding and so I wanted to mention it here in case anyone else stumbled across it!

Applying LQR to 2 and 3 link arm control

Alright! Let’s have a look at how the LQR does controlling non-linear systems. Below we have the control of a 2-link arm compared to a 3-link arm, and you can see the control of the 2-link arm is better. This is a direct result of the dynamics of a 3-link arm being significantly more complex.

2linkarm 3linkarm

Note on controlling at different timesteps

When I was first testing the LQR controller I expected the effects of different control update times to be a lot more significant than it was. As it turns out, for controlling a 3-link arm, there’s not really a visible difference in a controller that is updating every .01 seconds vs every .001 seconds vs every .0001 seconds. Let’s have a look:

3link.01 3link.001 3link.0001

Can’t even tell, eh? Fun fact, the simulation took 1 minute 30 seconds at .01 seconds time step and 45 minutes at .0001 seconds time step. The left-most animation is the .01 seconds and the right-most the .0001 seconds. But why is there seemingly so little difference? Well, this boils down to the dynamics of the 3-link arm changing actually pretty slowly. Below I’ve plotted just a few of the elements from the \textbf{A}, \textbf{B}, and \textbf{K} matrices over .5 seconds of simulation time:

A matrixB matrixK matrix

So, there are some obvious points where sampling the dynamics at a .01 time step is noticeably less accurate, but all in all there’s not a huuuggge difference between sampling at .01 and .0001 seconds. If you’re just watching the end-effector path it’s really not very noticeable. You can see how the elements of \textbf{A} and \textbf{B} are changing fairly slowly; this means that \textbf{K} is going to be an effective feedback gain for a fair chunk of time. And the computational savings you get by sampling the dynamics and regenerating \textbf{K} every .01 seconds instead of every .0001 seconds are pretty big. This was just another thing that I came across when playing around with the LQR, the take away being don’t just assume you need to update your system crazy often. You might get very comparable performance for much less computational cost.

Conclusions

All in all, the LQR controller is pretty neat! It’s really simple to set up, and generic. We don’t need any specific information about the system dynamics, like we do for effective operational space control (OSC). When we estimate the dynamics with finite differences, all need is a decent system model that we can sample. Again, the more non-linear the system, of course, the less effective a LQR will be. If you’re interested in playing around with one, or generating the figures that I show above, the code is all up and running on my Github for you to explore.

Advertisements
Tagged , , , , , , , ,

9 thoughts on “Linear-Quadratic Regulation for non-linear systems using finite differences

  1. […] few months ago I posted on Linear Quadratic Regulators (LQRs) for control of non-linear systems using finite-differences. The gist of it was at every time step linearize the dynamics, quadratize (it could be a word) the […]

  2. […] This paper, by Drs. Jan Peters and Stepfan Schaal, is where I first stumbled across the above formulation of gradient approximation and read about SPSA (side note: I really recommend reading instead the Scholarpedia article on policy gradients, because it has fixes to a couple notation mistakes from the paper). Reading about this I thought, oh interesting, an alternative to FDSA for gradient approximation, let’s see how it well it does replacing FDSA in a linear quadratic regulator (LQR). […]

  3. Wonchul Kim says:

    First, thank you for your kind explanation!!!
    However, there is nothing in the link to your github….

    Could you please upload them…?

    Thank you
    Wonchul Kim

  4. puneetp says:

    thanks for wonderful explanation ! You made it simple and effective.

  5. Kendrick says:

    Thanks for your kind and detailed explanation!
    That helps me a lot to understand and implement LQR to simulation.

    I have two questions for you.
    1. How can “dynamics are complex” be defined ?
    I mean, the difference between two links robot and three links robot is that the number of linkage. So number of links can explain “complex”.?

    2. How can “the more non-linear the system” be defined?
    I think the system can be classified into only two things, linear and non-linear.
    But, the more non-linear means the system behaves unexpectedly?
    Can you give me any useful example of it?

    Thanks.

    • travisdewolf says:

      Hello! Glad you found it useful!

      So, I’m speaking pretty loosely when saying that the ‘dynamics are complex’ or ‘more non-linear’, these aren’t intended as formal statements.
      I’d note too, that the equations of motion for a 3 link system vs a 2 link system are significantly more difficult to calculate, so that’s kind of intuitively what I mean by more complex.
      By ‘more nonlinear’ I’m referring to the size of the region of accuracy in state-space where a given linear approximation of the dynamics is useful. Does this help?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: