Monthly Archives: January 2013

Likelihood calculus paper series review part 1 – Controlling variability

Dr. Terry Sanger has a series of papers that have come out in the last few years describing what he has named ‘likelihood calculus’. The goal of these papers is to develop a ‘a theory of optimal control for variable, uncertain, and noisy systems that nevertheless accomplish real-world tasks reliably.’ The idea being that successful performance can be thought of as modulating variance of movement, allocating resources to tightly control motions when required and allowing variability in task-irrelevant dimensions. To perform variability modulation, we first need a means of capturing mathematically how the features of an uncertain controller operating affect variability in system movement. Defining terms quickly, the features of a controller are the different components that produce signals resulting in movement, variability is taken here to be the trial-to-trial variation in movements, and uncertainty means that the available sensory feedback does not uniquely determine the true state of the world, where uncertainty can arise from noise on sensory feedback signals, unmodeled dynamics, and/or quantitization of sensory feedback. To capture all this uncertainty and variability, probability theory will naturally be employed. In this post I will review the paper ‘Controlling variability’ (2010) by Dr. Sanger, which sets up the framework for describing the time course of uncertainty during movement.

Using probability in system representations

So, here’s a picture of the system our controller (the brain) is in:

There’s the input initial state x, and the output change in state, \dot{x}, which is generated as a combination of the unforced dynamics of the world and the control dynamics effected by the brain. But since we’re dealing with uncertainty and variability, we’re going to rewrite this such that given an initial state x, we get a probability distribution over potential changes in state, p(\dot{x}|x), which specifies the likelihood of each change in state \dot{x}, given our initial state probability distribution p(x). So in our system diagram, the word and the brain both define probability distributions over the possible changes in state, p_1(\dot{x}|x) and p_1(\dot{x}|x), respectively, which then combine to create the overall system dynamics p(\dot{x}|x). Redrawing our picture to incorporate the probabilities, we get:

One may ask: how do these probabilities combine? Good question! What we’d like to be able to do is combine them through simple linear operators, because they afford us massive simplifications in our calculations, but the combination of p_0(\dot{x}|x) and p_1(\dot{x}|x) isn’t as simple as summing and normalizing. The reason why can be a little tricky to tease out if you’re unfamiliar with probability, but will becomes clear with some thought. Consider what it means to go about combining the probabilities in this way. Basically, if you sum and normalize, then the result is saying that there is a 50% chance of doing what the brain says to do, and a 50% chance of doing what the world says to do, and doesn’t capture what is actually going to happen, which is an interaction between the effects of the brain and the world. A good example for thinking about this is rolling dice. If you roll each die individually, you have an equal chance of the result being each number 1-6, but if you roll two dice, the overall system probability of rolling numbers changes in a highly nonlinear fashion:

Suddenly there is a 0% chance of the result being a 1, and the probability of rolling a number increases in likelihood until you get to 7, at which point the likelihood decreases with ascending numbers; there is a nonlinear interaction at play that can’t be captured by summing the probabilities of rolling a number on each die individually and normalizing.

So summing the probability distributions over the possible changes in state, \dot{x}, isn’t going to work, but is there another way to combine these through linear operators? The answer is yes, but it’s going to require us to undergo a bit of a paradigm shift and expand our minds. What if, instead of capturing the change in the system by looking at p(\dot{x}|x), the probability distribution over possible changes in state given the current state, we instead capture the dynamics of the system by defining \dot{p}(x), the change in the probability of states through time? Instead of describing how the system evolves through the likelihood of different state changes, the dynamics are captured by defining the change in likelihood of different states; we are capturing the effect of the brain and the world on the temporal evolution of state probability. Does that freak you out?

Reworking the problem

Hopefully you’re not too freaked out. Now that you’ve worked your head around this concept, let’s look at \dot{p}(x) a little more closely. Specifically, let’s look at the Kramers-Moyal expansion (for a one-dimensional system):

\frac{\partial p(x,t)}{\partial t} = \sum_{k=1}^{\inf} \left( - \frac{\partial}{\partial x} \right)^k \{a_k(x) p(x)\} / k!,

a_k(x) = \int \dot{x}^k p(\dot{x}|x) d\dot{x}.

As Dr. Sanger notes, this is a daunting equation, but it can be understood relatively easily. The left side, \frac{\partial p(x,t)}{\partial t} = \dot{p}_t(x), is the rate of change of probability at each point x at time t. The right side is just the Taylor series expansion. If we take the first two terms of the Taylor series expansion, we get:

\frac{\partial p(x,t)}{\partial t} = - a_1 \frac{\partial}{\partial x} p(x) + \frac{a_2}{2} \frac{\partial^2}{\partial x^2} p(x),

where the first describes how the probability drifts (or shifts / translates), a_1 being the average value of \dot{x} for each value of x. The second term relates the rate of diffusion, a_2 being the second moment of the speed \dot{x}, describing the amount of spread in different possible speeds, where greater variability in speed leads to an increased spread of the probability. This is the Fokker-Planck equation, which describes the evolution of a physical process with constant drift and diffusion. At this point we make the assumption that our probability distributions are all going to be in the form of Gaussians (for which the Fokker-Planck equation exactly describes evolution of the system through time, and can arguably act as a good approximation to neural control systems where movement is based on the average activity of populations of neurons).

As an example of this, think of a 1-dimensional system, and a Gaussian probability distribution describing what state the system is likely to be in. The first term a_1 is the average rate of change \dot{x} across the states x the system could be in. The probability shifts through state space as specified by a_1. Intuitively, if you have a distribution with mean around position 1 and the system velocity is 4, then the change in your probability distribution, p(x), should shift the mean to position 5. The second term, a_2 is a measure of how wide the range of different possible speeds \dot{x} is. The larger the range of possible values, the less certain we become about the location of the system as it moves forward in time; the greater the range of possible states the system might end up in in the next time step. This is reflected by the rate of diffusion of the probability distribution. If we know for sure the speed the system moved at (i.e. all possible states will move with a specific \dot{x}), then we simply translate the mean of probability distribution. If however there’s uncertainty in the speed at which the system is moving, then the correct location (reflecting the actual system position) for the mean of the probability distribution could be one of a number of values. This is captured by increasing the width of (diffusing) the Gaussian.

Linear operators

Importantly, the equations above are linear in terms of p(x). This means we can rearrange the above equation:

\frac{\partial p(x,t)}{\partial t} = \left( -a_1 \frac{\partial}{\partial x} + \frac{a_2}{2} \frac{\partial^2}{\partial x^2} \right) p(x),

letting \mathcal{L} = \left( -a_1 \frac{\partial}{\partial x} + \frac{a_2}{2} \frac{\partial^2}{\partial x^2} \right), we have

\frac{\partial p(x,t)}{\partial t} = \mathcal{L} p(x).

Now we can redraw our system above as



\mathcal{L} =  \mathcal{L}_0 +  \mathcal{L}_1,

which is the straightforward combination of the different contributions of each of the brain and the world to the overall system state probability. How cool is that?

Alright, calm down. Time to look at using these operators. Let’s assume that the overall system dynamics \mathcal{L} hold constant for some period of time (taking particular care to note that ‘constant dynamics’ does not mean that a single constant output is produced irrespective of the input state, but rather that a given input state x always produces the same result while the dynamics are held constant), and we have discretized our representation of x (to be a range of some values, i.e. -100 to 100) then we can find the state probability distribution at time T by calculating

p(x, T) = A^T p(x,0),

where A^T = e^{T\mathcal{L}}.

When combining these \mathcal{L} operators, if we sum them, i.e. overall system dynamics

\mathcal{L} = \mathcal{L}_0 + \mathcal{L}_1,

and then apply them to the probability

\frac{\partial p(x,t)}{\partial t} = \mathcal{L}p(x)

this is saying that the dynamics provided by \mathcal{L}_0 and \mathcal{L}_1 are applied at the same time. But if we multiply the component dynamics operators,

\mathcal{L} = \mathcal{L}_1 \mathcal{L}_0

then when we apply them we have

\frac{\partial p(x,t)}{\partial t} = \mathcal{L}p(x) = \mathcal{L}_1 \mathcal{L}_0 p(x),

which is interpreted as applying \mathcal{L}_0 to the system, and then applying $\mathcal{L}_1$. Just basic algebra, but it allows us to apply simultaneously and sequentially the dynamics generated by our contributing system components (i.e. the brain and the world).

Capturing the effects of control

So now we have a representation of the system dynamics operating with variability and under uncertainty, we’re talking about building a tool to use for controlling these systems though, so where does the control signal u fit in? The \mathcal{L} operator is made to be a function of the control signal, describing the probablistic effect of the control signal given the possible initial states in the current state probability distribution p(x). Thus the change in state probability is now written

\frac{\partial p(x,t)}{\partial t} = \mathcal{L}(u)p(x).

Suppose that we drive a system with constant dynamics \mathcal{L}(u_1) for a period of time T_1, at which point we change the control signal and drive the system with constant dynamics \mathcal{L}(u_2) for another period of time T_2. The state of the system now be calculated

p(x, T_1 + T_2) = e^{T_2\mathcal{L}(u_2)}e^{T_1\mathcal{L}(u_1)}p(x,0) = A_{u_2}^{T_2}A_{u_1}^{T_1}p(x,0)

using the sequential application of dynamics operators discussed above.


And that is the essence of the paper ‘Controlling variability’. There is an additional discussion about the relationship to Bayes’ rule, which I will save for another post, and an example, but this is plenty for this post.

The main point from this paper is that we shouldn’t be focusing on the values of states as the object of control, but rather the probability densities of states. By doing this, we can capture the uncertainty in systems and work towards devising an effecting means of control. So, although the paper is called ‘Controlling variability’, the discussion of how to actually control variability is saved for later papers. All the same, I thought this was a very interesting paper, enjoyed working through it, and am looking forward to the rest of the series.

Sanger TD (2010). Controlling variability. Journal of motor behavior, 42 (6), 401-7 PMID: 21184358

Tagged , , , ,

Dynamic primitives of motor behavior – paper review

‘Dynamic primitives of motor behavior’ is a recent paper (2012) out by Neville Hogan and Dagmar Sternad.

This paper starts out professing the need for a theory of motor control that extends beyond a single task and situation, something near and dear to my heart. As they state, one of the problems with developing an encompassing theory is that most proposed theories are all seen as competing by the authors, slowing assimilation of ideas and development of an overarching structure. Laid out here, two of the most important features for any encompassing theory is that it accounts for broad classes of actions and addresses the major limitations of the human neuromuscular system, the highlighted limitation being the slow speed of efferent and afferent signals.

Synergies and dynamic primitives

The authors propose that human motor control is encoded solely in terms of primitive dynamic actions. This, as they point out, is definitely not a novel proposal, but they suggest that it and its implications haven’t been fully investigated. When people think of combining primitive actions, the idea of a synergy most often comes to mind, which refers to ‘steretyped patterns of simulataneous motion of multiple joints or simultaneous activation of multiple muscles that may simplify control’. The driving idea being that synergies could provide a means of dimensionality reduction for the controller, instead of having to fret over issuing every muscle activation signal, the controller modulates a set of larger actions, stitching them together simultaneously or sequentially. This making performing complex actions significantly simpler.

Taking this definition of a synergy, the authors say, is however insufficient for generating the complexity of behavior seen in humans: ‘this account of synergies constitutes an algebraic constraint, not a dynamic object. Even time-varying synergies are not dynamic objects, but constitute a kinematic constraint with time included as one of the variables related by the constraint’. That is to say, I believe, that this definition of a synergy is a strictly feedforward (open-loop), simplistic thing. It’s just a set of muscle activations that execute in a given order, without accounting for starting point, perturbations, or environment. In this sense they’re static, non-adapting to dynamic environments.

So, something more is needed. ‘Reducing the dimension of commands alone is not sufficient to account for how humans control complex dynamic objects. For that, the primitives of control should themselves be dynamic objects.’ At this point they bring in the discussion of dynamic primitives presented by the Schaal lab way back in the early 2000s, also known as the pre-Katy Perry era. The basic idea behind dynamic primitives is that there is a target point in state space that the system is drawn to (according to spring dynamics), and there is a time driven function that activates a forcing function which can move the system in interesting ways along its path to the target. Dynamic primitives can generate both discrete and rhythmic movements, ie they can act as point attractors or limit cycles. In the discrete case, the time driven function goes to zero, reducing the effect of the forcing function to zero, letting the default spring dynamics take over and pull the system to the target, guaranteeing convergence. In the rhythmic case, the time driven function goes to infinity and the cosine is taken to activate the forcing function in a rhythmic manner, where its movements are centered around the target in state space. There is a ton of really interesting and awesome things you can do with dynamic primitives, but that should be sufficient information for the discussion of this paper.

Taking dynamic primitives are used as our definition of synergies, we can generate significantly more robust behavior than with the alternate definition, because dynamic primitives are more than a sequence of muscle activations, they are attractors with dynamics that can guide the system in the face of perturbations and other noise. The authors term this property ‘”temporary permanence” (permanence due to robustness to perturbation; temporary because dynamic primitives, like phonemes of verbal communication, may have limited duration)’. Discrete and rhythmic dynamic primitives are then rewritten and termed ‘submovements’ and ‘oscillators’, respectively, that have an explicit mathematical summation and a speed profile with a single peak.

Virtual trajectories and mechanical impedance

The idea is then for the system to create a virtual trajectory to follow by combining these submovements and oscillators, creating a ‘trajectory attractor’. Although combinations of submovements and oscillators can account for a vast repertoire of movements in unconstrained environments, they’re not sufficient for describing any involving interactions with the environment. To do this, another aspect, mechanical impedance, is introduced as a feature to be accounted for in the construction of virtual trajectories. Mechanical impedance determines the force evoked by a displacement of a part of the system throughout the movement. In humans, mechanical impedance can be controlled by modulating the co-contraction of antagonist groups of muscles, holding a limb rigidly in place or letting it sway freely in response to applied outside force. The mechanical impedance for a given task then is a function that describes how to respond to outside forces throughout the time-course of a movement. Just like submovements and oscillations, mechanical impedances for different tasks can be combined through linear superposition to generate a novel function.

The incorporation of mechanical impedance into the specification of a movement has some really neat effects. The authors present several cases. One is the case of locomotion, instead of having two different primitive movements for different types of locomotion (such as walking normally and walking on balls of your feet, for example), these can both arise from the same composition of submovements and oscillators by varying the joint stiffness (mechanical impedance) profile throughout the movement. Another case is in reaching out to manipulate a door handle, one way to go about this is to have a precise model detailing the path to follow for the hand to appropriately close around the round object, and to adapt this over time to appropriately apply torque and open the door, but a simpler means is to have a crude model of the location of the door handle and its shape, and perform the movement with a low mechanical impedance, letting the hand form around the object appropriately as contact is made (this effectiveness of the latter method is also shown in simulation in the paper). Another point is that controlling mechanical impedance allows for feedforward specification, allowing appropriate reactions in situations where object contact is too fast for feedback based control to respond appropriately.

So, assuming we have a sets of our three classes of primitives, submovements, oscillators, and mechanical impedances, a virtual trajectory can be generated using the available primitives as basis functions which cann be combined through weighted summation. And the authors propose ‘that what is learned, encoded, and retrieved are the parameters of dynamic primitives, rather than any details of behavior’.


There were a number of interesting ideas in this paper, being already familiar with dynamical primitives and compositionality of movement (and already using both in my own work), the part I found most interesting was the incorporation of mechanical impedance as a movement feature for modulation. The door handle example being a particularly compelling example of the robustness and power this incorporation adds to movements. And a great bonus to using dynamical primitives as a basis for a motor control system is that some great work has been done incorporating learning into dynamical primitives (albeit not designed with any intent of neural plausibility), specifically the path integral policy improvement work done by Evangelous Theordorou during his time in the Schaal lab.

The terms ‘virtual trajectories’ and ‘trajectory attractor’ occur to me as a bit dangerous in their easy misintepretability, where they could seem more like a kinematic path specification method than the result of combining a number of dynamical attractor systems.

I assume that the reduced bandwidth required for the modulation of a set of primitives rather than the specification of a full control signal and the ability of these modulatory terms to specify appropriate responses to any encountered external force is the means of addressing to the transmission speed problem mentioned at the beginning of the paper. Learning the modulatory signals of a given set of primitives is another feature that appeals to me, as opposed to specifying the explicit muscle activations, because the former has the potential to take advantage of whatever built in circuitry is going on in the spinal cord, that very often ignored crazy complex structure that motor signals are routed through.

Finally, a recurring theme of the paper is also making the case for a overarching falsifiable theory that can be built up and revised incrementally, and isn’t thrown away when some experiment provides contradicting data. This is another call in the field lately, and the plan of attack I’ve been following myself. Maybe I could try directing them towards the Neural Optimal Control Hierarchy framework I’ve been building…
Hogan N, & Sternad D (2012). Dynamic primitives of motor behavior. Biological cybernetics, 106 (11-12), 727-39 PMID: 23124919

Tagged ,