BM: An iterative algorithm to learn stable non-linear dynamical systems with Gaussian mixture models.

Khansari-Zadeh, S. M. and Billard, A.
in: Proc. IEEE Int Robotics and Automation (ICRA) Conf, pp. 2381–2388, 2010
DOI, Google Scholar


We model the dynamics of non-linear point-topoint robot motions as a time-independent system described by an autonomous dynamical system (DS). We propose an iterative algorithm to estimate the form of the DS through a mixture of Gaussian distributions. We prove that the resulting model is asymptotically stable at the target. We validate the accuracy of the model on a library of 2D human motions and to learn a control policy through human demonstrations for two multidegrees of freedom robots. We show the real-time adaptation to perturbations of the learned model when controlling the two kinematically-driven robots.


The authors describe a system for learning nonlinear, multivariate dynamical systems based on Gaussian mixture regression (GMR). The difference to previous approaches using GMR (e.g. Gribovskaya2010) is that the GMR is done by pruning a Gaussian mixture model which has a Gaussian at each time point such that accuracy and stability criteria are adhered to. Pruning here actually means that two neighbouring Gaussians are merged. Consequently, the main contribution from the paper is the derivation and proof of the corresponding stability criteria – something that I haven’t checked properly.

They make a quantitative comparison between their binary merging approach, original EM learning of GMR, using LWPR to learn the dynamics and using DMPs. However, they do not tell the precise procedures. I am particular surprised about the very low accuracy of the DMPs compared to the other approaches. Unless they have done something special (such as introduce large temporal deviations as done for Fig. 2) I don’t see why the accuracy for DMPs should be so low.

They argue that the main advantages of their approach are that a minimal number of used Gaussians is automatically determined while the resulting dynamics is stable at all times, that the multivariate Gaussians can capture correlations between dimensions (in contrast to DMPs) and that the computations are less costly than when using Gaussian Process Regression. The disadvantages are that the number of parameters increases quadratically with the dimensionality (curse of dimensionality, not so crucial for their 2, 4 or 6D examples, but then?), but, in particular, that the pruning procedure is highly susceptible to local minima issues and results depend on the order in which Gaussians are merged. In the extreme case, imagine that through the presence of noise none of the initial Gaussians can be merged without violating the accuracy constraint. Again, this might not be a problem for their very smooth data, but it will become problematic for more noisy data. Similar problems lead to the dependency on the order of merges (which are selected randomly). To overcome the order dependency they suggest to restart the algorithm several times and then select the result with the smallest number of Gaussians. Note that this compromises their computational advantages over GPs. While computing a GP mapping is cubic in the number of data points, merging the Gaussians is quadratic in the number of time points, but if you consider that different merge orders need to be checked, you’ll notice that there are 2 to the power of time points possible merge sequences, meaning that your computational costs can increase exponentially in the worst case when really the best solution is supposed to be found (if you optimise the hyperparameters in GPs you’re in a similar situation in a continuous space, though).

Efficient Reductions for Imitation Learning.

Ross, S. and Bagnell, D.
in: JMLR W&CP 9: AISTATS 2010, pp. 661–668, 2010
Google Scholar


Imitation Learning, while applied successfully on many large real-world problems, is typically addressed as a standard supervised learning problem, where it is assumed the training and testing data are i.i.d.. This is not true in imitation learning as the learned policy influences the future test inputs (states) upon which it will be tested. We show that this leads to compounding errors and a regret bound that grows quadratically in the time horizon of the task. We propose two alternative algorithms for imitation learning where training occurs over several episodes of interaction. These two approaches share in common that the learner’s policy is slowly modified from executing the expert’s policy to the learned policy. We show that this leads to stronger performance guarantees and demonstrate the improved performance on two challenging problems: training a learner to play 1) a 3D racing game (Super Tux Kart) and 2) Mario Bros.; given input images from the games and corresponding actions taken by a human expert and near-optimal planner respectively.


The authors note that previous approaches of learning a policy from an example policy are limited in the sense that they only see successful examples generated from the desired policy and, therefore, will exhibit a larger error than expected from supervised learning of independent samples, because an error can propagate through the series of decisions, if the policy hasn’t learnt to recover to the desired policy when an error occurred. They then show that a lower error can be expected when a Forward Algorithm is used for training which learns a non-stationary policy successively for each time step. The idea probably being (I’m not too sure) that the data at the time step that is currently learnt contains the errors (that lead to different states) you would usually expect from the learnt policies, because for every time step new data is sampled based on the already learnt policies. They transfer this idea to learning of a stationary policy and propose SMILe (stochastic mixing iterative learning). In this algorithm the stationary policy is a linear combination of policies learnt in previous iterations where the initial policy is the desired one. The influence of the desired policy decreases exponentially with the number of iterations, but also the weights of policies learnt later decrease exponentially, but stay fixed in subsequent iterations, i.e. the policies learnt first will have the largest weights eventually. This makes sense, because they will most probably be closest to the desired policy (seeing mostly samples produced from the desired policy).

The aim is to make the learnt policy more robust without using too many samples from the desired policy. I really wonder whether you could achieve exactly the same performance by simply additionally sampling the desired policy from randomly perturbed states and adding these as training points to learning of a single policy. Depending on how expensive your learning algorithm is this may be much faster in total (as you only have to learn once on a larger data set). Of course, you then may not have the theoretical guarantees provided in the paper. Another drawback of the approach presented in the paper is that it needs to be possible to sample from the desired policy interactively during the learning. I can’t imagine a scenario where this is practical (a human in the loop?).

I was interested in this, because in an extended abstract to a workshop (see attached files) the authors referred to this approach and also mentioned Langford2009 as a similar learning approach based on local updates. Also you can see the policy as a differential equation, i.e. the results of the paper may also apply to learning of dynamical systems without control inputs. The problems are certainly very similar.

They use a neural network to learn policies in the particular application they consider.

Generating coherent patterns of activity from chaotic neural networks.

Sussillo, D. and Abbott, L. F.
Neuron, 63:544–557, 2009
DOI, Google Scholar


Neural circuits display complex activity patterns both spontaneously and when responding to a stimulus or generating a motor output. How are these two forms of activity related? We develop a procedure called FORCE learning for modifying synaptic strengths either external to or within a model neural network to change chaotic spontaneous activity into a wide variety of desired activity patterns. FORCE learning works even though the networks we train are spontaneously chaotic and we leave feedback loops intact and unclamped during learning. Using this approach, we construct networks that produce a wide variety of complex output patterns, input-output transformations that require memory, multiple outputs that can be switched by control inputs, and motor patterns matching human motion capture data. Our results reproduce data on premovement activity in motor and premotor cortex, and suggest that synaptic plasticity may be a more rapid and powerful modulator of network activity than generally appreciated.


The authors present a new way of reservoir computing. The setup apparently (haven’t read the paper) is very similar to the echo state networks of Herbert Jaeger (Jaeger and Haas, Science, 2004); the difference being the signal that is fed back to the reservoir from the output. While Jaeger fed back the target value f(t), they feed back the error between f(t) and the prediction given the current weights and reservoir activity. Key to their approach then is that they use a weight update rule which almost instantaneously provides weights that minimise the error. While this obviously leads to a very high variability of the weights across time steps at the start of learning, they argue that this variability diminishes during learning and weights eventually stabilise such that, when learning is switched off, the target dynamics is reproduced. They present a workaround which may make it possible to also learn non-periodic functions, but it’s clearly better suited for periodic functions.

I wonder how the learning is divided between feedback mechanism and weight adaptation (network model of Fig. 1A). In particular, it could well be that the feedback mechanism is solely responsible for successfull learning while the weights just settle to a more or less arbitrary setting once the dynamics is stabilised through the feedback (making weights uninterpretable). The authors also report how the synapses within the reservoir can be adapted to reproduce the target dynamics when no feedback signal is given from the network output (structure in Fig. 1C). Curiously, the credit assignment problem is solved by ignoring it: for the adaptation of reservoir synapses the same network level output error is used as for the adaptation of output weights.

It’s interesting that it works, but to know why and how it works would be good. The main argument of the authors why their proposal is better than echo state networks is that their proposal is more stable. They present corresponding results in Fig. 4, but they never tell us what they mean by stable. So how stable are the dynamics learnt by FORCE? How much can you perturb the network dynamics before it stops being able to reproduce the target dynamics. In other words, how far off the desired dynamics can you initialise the network state?

They have an interesting principal components analysis of network activity suggesting that the dynamics converges to the same values for the first principal components for different starting states, but I haven’t understood it well enough during this first read to comment further on that.

Modeling discrete and rhythmic movements through motor primitives: a review.

Degallier, S. and Ijspeert, A.
Biol Cybern, 103:319–338, 2010
DOI, Google Scholar


Rhythmic and discrete movements are frequently considered separately in motor control, probably because different techniques are commonly used to study and model them. Yet the increasing interest in finding a comprehensive model for movement generation requires bridging the different perspectives arising from the study of those two types of movements. In this article, we consider discrete and rhythmic movements within the framework of motor primitives, i.e., of modular generation of movements. In this way we hope to gain an insight into the functional relationships between discrete and rhythmic movements and thus into a suitable representation for both of them. Within this framework we can define four possible categories of modeling for discrete and rhythmic movements depending on the required command signals and on the spinal processes involved in the generation of the movements. These categories are first discussed in terms of biological concepts such as force fields and central pattern generators and then illustrated by several mathematical models based on dynamical system theory. A discussion on the plausibility of theses models concludes the work.


In the first part, the paper reviews experimental evidence for the existence of a motor primitive system located on the level of the spinal cord. In particular, the discussion is centred on the existence of central pattern generators and force fields (also: muscle synergies) defined in the spinal cord. Results showing the independence of these from cortical signals exist for animals up to the cat, or so. “In humans, the activity of the isolated spinal cord is not observable, […]: influences from higher cortical areas and from sensory pathways can hardly be excluded.”

The remainder of the article reviews dynamical systems that have been proposed as models for movement primitives. The models are roughly characterised according to the assumptions about the relationships between discrete and rhythmic movements. The authors define 4 categories: two/two, one/two, one/one and two/one, where a two means separate systems for discrete and rhythmic movements, a one means a common system, the number before the slash corresponds to the planning process (signals potentially generated as motor commands from cortex) and the number behind the slash corresponds to the execution system where the movement primitives are defined.

You would think that the aim of this excercise is to work out advantages and disadvantages of the models, but the authors mainly restrict themselves to describing the models. The main conclusion then is that discrete and rhythmic movements can be generated from movement primitives in the spinal cord while cortex may only provide simple, non-patterned commands. The proposed categorisation may help to discern models experimentally, but apparently there is currently no conclusive evidence favouring any of the categories (authors repeatedly cite two conflicting studies).

Winnerless competition between sensory neurons generates chaos: A possible mechanism for molluscan hunting behavior.

Varona, P., Rabinovich, M. I., Selverston, A. I., and Arshavsky, Y. I.
Chaos: An Interdisciplinary Journal of Nonlinear Science, 12:672–677, 2002
DOI, Google Scholar


In the presence of prey, the marine mollusk Clione limacina exhibits search behavior, i.e., circular motions whose plane and radius change in a chaotic-like manner. We have formulated a dynamical model of the chaotic hunting behavior of Clione based on physiological in vivo and in vitro experiments. The model includes a description of the action of the cerebral hunting interneuron on the receptor neurons of the gravity sensory organ, the statocyst. A network of six receptor model neurons with Lotka-Volterra-type dynamics and nonsymmetric inhibitory interactions has no simple static attractors that correspond to winner take all phenomena. Instead, the winnerless competition induced by the hunting neuron displays hyperchaos with two positive Lyapunov exponents. The origin of the chaos is related to the interaction of two clusters of receptor neurons that are described with two heteroclinic loops in phase space. We hypothesize that the chaotic activity of the receptor neurons can drive the complex behavior of Clione observed during hunting.


see Levi2005 for short summary in context

My biggest concern with this paper is that the changes in direction of the mollusc may also result from feedback from the body and especially the stratocysts during its accelerated swimming. The question is, are these direction changes a result of chaotic, but deterministic dynamics in the sensory network as suggested by the model, or are they a result of essentially random processes which may be influenced by feedback from other networks? The authors note that in their model “The neurons keep the sequence of activation but the interval in which they are active is continuously changing in time”. After a day of search for papers which have investigated the swimming behaviour of Clione limacina (the mollusc in question) I came to the conclusion that the data schown in Fig. 1 likely is the only data set of swimming behaviour that was published. This small data set suggests random changes in direction, in contrast to the model, but it does not allow to draw any definite conclusions about the repetitiveness of direction changes.

The role of sensory network dynamics in generating a motor program.

Levi, R., Varona, P., Arshavsky, Y. I., Rabinovich, M. I., and Selverston, A. I.
J Neurosci, 25:9807–9815, 2005
DOI, Google Scholar


Sensory input plays a major role in controlling motor responses during most behavioral tasks. The vestibular organs in the marine mollusk Clione, the statocysts, react to the external environment and continuously adjust the tail and wing motor neurons to keep the animal oriented vertically. However, we suggested previously that during hunting behavior, the intrinsic dynamics of the statocyst network produce a spatiotemporal pattern that may control the motor system independently of environmental cues. Once the response is triggered externally, the collective activation of the statocyst neurons produces a complex sequential signal. In the behavioral context of hunting, such network dynamics may be the main determinant of an intricate spatial behavior. Here, we show that (1) during fictive hunting, the population activity of the statocyst receptors is correlated positively with wing and tail motor output suggesting causality, (2) that fictive hunting can be evoked by electrical stimulation of the statocyst network, and (3) that removal of even a few individual statocyst receptors critically changes the fictive hunting motor pattern. These results indicate that the intrinsic dynamics of a sensory network, even without its normal cues, can organize a motor program vital for the survival of the animal.


The authors investigate the neural mechanisms of hunting behaviour in a mollusk. It’s simplicity allows that the nervous system can be completely stripped apart from the rest of the body and be investigated in isolation from the body, but as a whole. In particular, the authors are interested in the causal influence of sensory neurons on motor activity.

The mollusk has two types of behaviour for positioning its body in the water: 1) it uses gravitational sensors (statocysts) to maintain a head-up position in the water under normal circumstances and 2) it swims in apparently chaotic, small loops when it suspects prey in its vicinity (searching). In this paper the authors present evidence that the searching behaviour 2) is still largely dependent on the (internal) dynamics of the statocysts.

The model is as follows (see Varona2002): without prey inhibitory connections between sensory cells in the stratocysts make sure that only a small proportion of cells are firing (those that are activated by mechanoreceptors according to gravitation acting on a stone-like structure in the statocysts), but when prey is in the vicinity of the mollusk (as indicated by e.g. chemoreceptors) cerebral hunting neurons additionally excite the statocyst cells inducing chaotic dynamics between them. The important thing to note is that then the statocysts still influence motor behaviour as shown in the paper. So the argument is that the same mechanism for producing motor output dependent on statocyst signals can be used to generate searching just through changing the activity of the sensory neurons.

Overall the evidence presented in the paper is convincing that statocyst activity influences the activity of the motor neurons also in the searching behaviour, but it cannot be said concludingly that the statocysts are necessary for producing the swimming, because the setup allowed only the activity of motor neurons to be observed without actually seeing the behaviour (actually Levi2004 show that the typical searching behaviour cannot be produced when the statocysts are removed). For the same reason, the experiments also neglected possible feedback mechanisms between body/mollusk and environment, e.g. in the statocyst activity due to changing gravitational state, i.e. orientation. The argument there is, though not explicitly stated, that the statocyst stops computing the actual orientation of the body, but is purely driven through its own dynamics. Feedback from the peripheral motor system is not modelled (Varona2002, argueing that for determining the origin of the apparent chaotic behaviour this is not necessary).

For us this is a nice example for how action can be a direct consequence of perception, but even more so that internal sensory dynamics can produce differentiated motor behaviour. The connection between sensory states and motor activity is relatively fixed, but different motor behaviour may be generated by different processing in the sensory system. The autonomous dynamics of the statocysts in searching behaviour may also be interpreted as being induced from different, high-precision predictions on a higher level. It may be questioned how good a model the mollusk nervous system is for information processing in the human brain, but maybe they share these principles.