Paper Decoder

Temporal sparseness of the premotor drive is important for rapid learning in a neural network model of birdsong.

Fiete, I. R., Hahnloser, R. H. R., Fee, M. S., and Seung, H. S.
J Neurophysiol, 92:2274–2282, 2004
DOI, Google Scholar


Sparse neural codes have been widely observed in cortical sensory and motor areas. A striking example of sparse temporal coding is in the song-related premotor area high vocal center (HVC) of songbirds: The motor neurons innervating avian vocal muscles are driven by premotor nucleus robustus archistriatalis (RA), which is in turn driven by nucleus HVC. Recent experiments reveal that RA-projecting HVC neurons fire just one burst per song motif. However, the function of this remarkable temporal sparseness has remained unclear. Because birdsong is a clear example of a learned complex motor behavior, we explore in a neural network model with the help of numerical and analytical techniques the possible role of sparse premotor neural codes in song-related motor learning. In numerical simulations with nonlinear neurons, as HVC activity is made progressively less sparse, the minimum learning time increases significantly. Heuristically, this slowdown arises from increasing interference in the weight updates for different synapses. If activity in HVC is sparse, synaptic interference is reduced, and is minimized if each synapse from HVC to RA is used only once in the motif, which is the situation observed experimentally. Our numerical results are corroborated by a theoretical analysis of learning in linear networks, for which we derive a relationship between sparse activity, synaptic interference, and learning time. If songbirds acquire their songs under significant pressure to learn quickly, this study predicts that HVC activity, currently measured only in adults, should also be sparse during the sensorimotor phase in the juvenile bird. We discuss the relevance of these results, linking sparse codes and learning speed, to other multilayered sensory and motor systems.


They model the generation of bird song as a simple feed-forward network and show that a sparse temporal code of HVC neurons (feeding into RA neurons) speeds up learning with backpropagation. They argue that this speed up is the main explanation for why real HVC neurons exhibit a sparse temporal code.

HVC neurons are modelled as either on or off, i.e., bursting or non-bursting, while RA neurons have continuous activities. A linear combination of RA neurons then determines the output of the network. They define a desired, low-pass filtered output that should be learnt, but while their Fig. 2 suggests that they model the sequential aspect of the data, the actual network has no such component and the temporal order of the data points is irrelevant for learning. Maybe fixing, i.e., not learning, the connections from RA to output is biologically well motivated, but other choices for the network seem to be quite arbitrary, e.g., why do RA neurons project from the beginning to only one of two outputs? They varied quite a few parameters and found that their main result (learning is faster with sparse HVC firing) holds for all of them, though. Interesting to note: they had to initialise HVC-RA and RA thresholds such that initial RA activity is low and nonuniform in order to get desired type of RA activity after learning.

I didn’t like the paper that much, because they showed the benefit of sparse coding for the biologically implausible backpropagation learning. Would it also hold up against a Hebbian learning paradigm? On the other hand, the whole idea of being able to learn better when each neuron is only responsible for one restricted part of the stimulus is so outrageously intuitive that you wonder why this needed to be shown in the first place (Stefan noted, though, that he doesn’t know of work investigating temporal sparseness compared to spatial sparseness)? Finally, you cannot argue that this is the main reason why HVC neurons fire in a temporally sparse manner, because there might be other unknown reasons and this is only a side effect.

Expectation and surprise determine neural population responses in the ventral visual stream.

Egner, T., Monti, J. M., and Summerfield, C.
J Neurosci, 30:16601–16608, 2010
DOI, Google Scholar


Visual cortex is traditionally viewed as a hierarchy of neural feature detectors, with neural population responses being driven by bottom-up stimulus features. Conversely, “predictive coding” models propose that each stage of the visual hierarchy harbors two computationally distinct classes of processing unit: representational units that encode the conditional probability of a stimulus and provide predictions to the next lower level; and error units that encode the mismatch between predictions and bottom-up evidence, and forward prediction error to the next higher level. Predictive coding therefore suggests that neural population responses in category-selective visual regions, like the fusiform face area (FFA), reflect a summation of activity related to prediction (“face expectation”) and prediction error (“face surprise”), rather than a homogenous feature detection response. We tested the rival hypotheses of the feature detection and predictive coding models by collecting functional magnetic resonance imaging data from the FFA while independently varying both stimulus features (faces vs houses) and subjects’ perceptual expectations regarding those features (low vs medium vs high face expectation). The effects of stimulus and expectation factors interacted, whereby FFA activity elicited by face and house stimuli was indistinguishable under high face expectation and maximally differentiated under low face expectation. Using computational modeling, we show that these data can be explained by predictive coding but not by feature detection models, even when the latter are augmented with attentional mechanisms. Thus, population responses in the ventral visual stream appear to be determined by feature expectation and surprise rather than by stimulus features per se.


In general the design of the study is interesting as it is a fMRI study investigating the effects of a stimulus that is presented immediately before the actually analysed stimulus, i.e. temporal dependencies between sequentially presented stimuli of which predictability is a subset (actually priming studies would also fall into this category, don’t know how well they are studied with fMRI).

While the original predictive coding and feature detection models are convincing, the feature detection + attention models are confusing. All models seem to lack a baseline. The attention models are somehow defined on the “differential FFA response” and this is not further explained. The f b_1 part of the attention models can actually be reduced to b_1.

Katharina noted that, in contrast to here where they didn’t do it, you should do a small sample correction, if you want to do the ROI analysis properly.

They do not differentiate between prediction error and surprise in the paper. Surprise is the precision-weighted prediction error.

SORN: a self-organizing recurrent neural network.

Lazar, A., Pipa, G., and Triesch, J.
Front Comput Neurosci, 3:23, 2009
DOI, Google Scholar


Understanding the dynamics of recurrent neural networks is crucial for explaining how the brain processes information. In the neocortex, a range of different plasticity mechanisms are shaping recurrent networks into effective information processing circuits that learn appropriate representations for time-varying sensory stimuli. However, it has been difficult to mimic these abilities in artificial neural network models. Here we introduce SORN, a self-organizing recurrent network. It combines three distinct forms of local plasticity to learn spatio-temporal patterns in its input while maintaining its dynamics in a healthy regime suitable for learning. The SORN learns to encode information in the form of trajectories through its high-dimensional state space reminiscent of recent biological findings on cortical coding. All three forms of plasticity are shown to be essential for the network’s success.


The paper considers the question of whether adapting an RNN used as a reservoir gives better performance in a sequence prediction task than randomly initialised RNNs. The authors demonstrate an adaptation procedure based on spike-timing-dependent plasticity (STDP) controlled with intrinsic plasticity (IP) and synaptic normalisation (SN) as homeostatic mechanisms and show that the performance of the adapted RNNs is indeed superior to the performance of the random RNNs. They further show that IP and SN are necessary for good results, or rather that without either the RNN exhibits disadvantageous firing patterns (bursting, always on, always off).

This is one of the few studies which shows successfull learning of RNNs. However, they use a rather simple model: a binary network in discrete time. The connectivity of the network is more elaborate: there are excitatory units which are recurrently connected, as well as fewer inhibitory neurons which have no connections between themselves, but are fully and reciprocally connected with all excitatory units. Input to the network is given to excitatory units through input units which are separated into subsets which each give a spike (1) when a specific symbol in the input sequence is currently present (input sequences consist of letters and numbers). The authors show that the RNN develops states (activity of all units in the network as a vector) which are specific to individual input symbols with the addition that also the serial number of the input symbol in the sequence is represented. This simplifies readout of the current symbol in the sequence from RNN activity and hence leads to improved performance of predicting the next symbol in the sequence using a standard reservoir computing readout function. However, the authors note that the RNN keeps on changing its response to input, i.e., their learning rule does not converge which means that the readout function would have to be updated all the time as well. Consequently, they switch off learning in the test phase.

The authors show that it is beneficial that recurrent connections between excitatory units are sparse.

An embodied account of serial order: How instabilities drive sequence generation.

Sandamirskaya, Y. and Schöner, G.
Neural Networks, 23:1164–1179, 2010
DOI, Google Scholar


Learning and generating serially ordered sequences of actions is a core component of cognition both in organisms and in artificial cognitive systems. When these systems are embodied and situated in partially unknown environments, specific constraints arise for any neural mechanism of sequence generation. In particular, sequential action must resist fluctuating sensory information and be capable of generating sequences in which the individual actions may vary unpredictably in duration. We provide a solution to this problem within the framework of Dynamic Field Theory by proposing an architecture in which dynamic neural networks create stable states at each stage of a sequence. These neural attractors are destabilized in a cascade of bifurcations triggered by a neural representation of a condition of satisfaction for each action. We implement the architecture on a robotic vehicle in a color search task, demonstrating both sequence learning and sequence generation on the basis of low-level sensory information.


The paper presents a dynamical model of the execution of sequential actions driven by sensory feedback which allows variable duration of individual actions as signalled by external cues of subtask fulfillment (i.e. end of action). Therefore, it is one of the first functioning models with continuous dynamics which truly integrates action and perception. The core technique used is dynamic field theory (DFT) which implements winner-take-all dynamics in the continuous domain, i.e. the basic dynamics stays at a uniform baseline until a sufficiently large input at a certain position drives activity over a threshold and produces a stable single peak of activity around there. The different components of the model all run with dynamics using the same principle and are suitably connected such that stable peaks in activity can be destabilised to allow moving the peak to a new position (signalling something different).

The aim of the excercise is to show that varying length sequential actions can be produced by a model of continuous neuronal population dynamics. Sequential structure is induced in the model by a set of ordinal nodes which are coupled via additional memory nodes such that they are active one after the after. However, the switch to the next ordinal node in the sequence needs to be triggered by sensory input which indicates that the aim of an action has been achieved. Activity of an ordinal node then directly induces a peak in the action field at a location determined by a set of learnt weights. In the robot example the action space is defined over the hue value, i.e. each action selects a certain colour. The actual action of the robot (turning and accelerating) is controlled by an additional color-space field and some motor dynamics not part of the sequence model. Hence, their sequence model as such only prescribes discrete actions. To decide whether an action has been successfully completed the action field increases activity in a particular spot in a condition of satisfaction field which only peaks at that spot, if suitable sensory input drives the activity at the spot over the threshold. Which spot the action field selects is determined by hand here (in the example it’s an identity function). A peak in the condition of satisfaction field then triggers a switch to the next ordinal node in the sequence. We don’t really see an evaluation of system performance (by what criterion?), but their system seems to work ok, at least producing the sequences in the order demonstrated during learning.

The paper is quite close to what we are envisaging. The free energy principle could add a Bayesian perspective (we would have to find a way to implement the conditional progression of a sequence, but I don’t see a reason why this shouldn’t be possible). Apart from that the function implemented by the dynamics is extremely simple. In fact, the whole sequential system could be replaced with simple, discrete if-then logic without having to change the continuous dynamics of the robot implementation layer (color-space field and motor dynamics). I don’t see how continuous dynamics here helps except that it is more biologically plausible. This is also a point on which the authors focus in the introduction and discussion. Something else that I noticed: all dynamic variables are only 1D (except for the colour-space field which is 2D). This is probably because the DFT formalism requires that the activity over the field is integrated for each position in the field every simulation step to compute the changes in activity (cf. computation of expectations in Bayesian inference) which is probably infeasible when the representations contain several variables.

Cortical Preparatory Activity: Representation of Movement or First Cog in a Dynamical Machine?

Churchland, M. M., Cunningham, J. P., Kaufman, M. T., Ryu, S. I., and Shenoy, K. V.
Neuron, 68:387 – 400, 2010
DOI, Google Scholar


Summary The motor cortices are active during both movement and movement preparation. A common assumption is that preparatory activity constitutes a subthreshold form of movement activity: a neuron active during rightward movements becomes modestly active during preparation of a rightward movement. We asked whether this pattern of activity is, in fact, observed. We found that it was not: at the level of a single neuron, preparatory tuning was weakly correlated with movement-period tuning. Yet, somewhat paradoxically, preparatory tuning could be captured by a preferred direction in an abstract #space# that described the population-level pattern of movement activity. In fact, this relationship accounted for preparatory responses better than did traditional tuning models. These results are expected if preparatory activity provides the initial state of a dynamical system whose evolution produces movement activity. Our results thus suggest that preparatory activity may not represent specific factors, and may instead play a more mechanistic role.


What are the variables that best explain the preparatory tuning of neurons in dorsal premotor and primary motor cortex of monkeys doing a reaching task? This is the core question of the paper which is motivated by the observation of the authors that preparatory and perimovement (ie. within movement) activity of a single neuron may even qualitatively differ considerably (something conflicting with the view that preparatory activity is a subthreshold version of perimovment activity). This observation is experimentally underlined in the paper by showing that average preparatory activity and average perimovement activity of a single neuron are largely uncorrelated for different experimental conditions.

To quantify the suitability of a set of variables to explain perparatory activity of a neuron the authors use a linear regression approach in which the values of these variables for a given experimental condition are used to predict the firing rate of the neuron in that condition. The authors compute the generalisation error of the learnt linear model with crossvalidation and compare the performance of several sets of variables based on this error. The variables performing best are the principal component scores of the perimovement population activity of all recorded neurons. The difference to alternative sets of variables is significant and in particular the wide range of considered variables makes the result convincing (e.g. target position, initial velocity, endpoints and maximum speed, but also principal component scores of EMG activity and kinematic variables, i.e. position, speed and acceleration of the hand). That perimovement activity is the best regressor for preparatory activity is quite odd, or as Burak aptly put it: “They are predicting the past.”

The authors suggest a dynamical systems view as explanation for their results and hypthesise that preparatory activity sets the initial state of the dynamical system constituted by the population of neurons. In this view, the preparatory activity of a single neuron is not sufficient to predict its evolution of activity (note that the correlation between perparatory and perimovement activity assesses only one particular way of predicting perimovement from preparatory activity – scaling), but the evolution of activity of all neurons can be used to determine the preparatory activity of a single neuron under the assumption that the evolution of activity is governed by approximately linear dynamics. If the dynamics is linear, then any state in the future is a linear transformation of the initial state and given enough data points from the future the initial state can be determined by an appropriate linear inversion. The additional PCA, also a linear transformation, doesn’t change that, but makes the regression easier and, important for the noisy data, also regularises.

These findings and suggestions are all quite interesting and certainly fit into our preconceptions about neuronal activity, but are the presented results really surprising? Do people still believe that you can make sense of the activity of isolated neurons in cortex, or isn’t it already accepted that population dynamics is necessary to characterise neuronal responses? For example, Pillow et al. (Pillow2008) used coupled spiking models to successfully predict spike trains directly from stimuli in retinal ganglion cells. On the other hand, Churchland et al. indirectly claim in this paper that the population dynamics is (approximately) linear, which is certainly disputable, but what would nonlinear dynamics mean for their analysis?

BM: An iterative algorithm to learn stable non-linear dynamical systems with Gaussian mixture models.

Khansari-Zadeh, S. M. and Billard, A.
in: Proc. IEEE Int Robotics and Automation (ICRA) Conf, pp. 2381–2388, 2010
DOI, Google Scholar


We model the dynamics of non-linear point-topoint robot motions as a time-independent system described by an autonomous dynamical system (DS). We propose an iterative algorithm to estimate the form of the DS through a mixture of Gaussian distributions. We prove that the resulting model is asymptotically stable at the target. We validate the accuracy of the model on a library of 2D human motions and to learn a control policy through human demonstrations for two multidegrees of freedom robots. We show the real-time adaptation to perturbations of the learned model when controlling the two kinematically-driven robots.


The authors describe a system for learning nonlinear, multivariate dynamical systems based on Gaussian mixture regression (GMR). The difference to previous approaches using GMR (e.g. Gribovskaya2010) is that the GMR is done by pruning a Gaussian mixture model which has a Gaussian at each time point such that accuracy and stability criteria are adhered to. Pruning here actually means that two neighbouring Gaussians are merged. Consequently, the main contribution from the paper is the derivation and proof of the corresponding stability criteria – something that I haven’t checked properly.

They make a quantitative comparison between their binary merging approach, original EM learning of GMR, using LWPR to learn the dynamics and using DMPs. However, they do not tell the precise procedures. I am particular surprised about the very low accuracy of the DMPs compared to the other approaches. Unless they have done something special (such as introduce large temporal deviations as done for Fig. 2) I don’t see why the accuracy for DMPs should be so low.

They argue that the main advantages of their approach are that a minimal number of used Gaussians is automatically determined while the resulting dynamics is stable at all times, that the multivariate Gaussians can capture correlations between dimensions (in contrast to DMPs) and that the computations are less costly than when using Gaussian Process Regression. The disadvantages are that the number of parameters increases quadratically with the dimensionality (curse of dimensionality, not so crucial for their 2, 4 or 6D examples, but then?), but, in particular, that the pruning procedure is highly susceptible to local minima issues and results depend on the order in which Gaussians are merged. In the extreme case, imagine that through the presence of noise none of the initial Gaussians can be merged without violating the accuracy constraint. Again, this might not be a problem for their very smooth data, but it will become problematic for more noisy data. Similar problems lead to the dependency on the order of merges (which are selected randomly). To overcome the order dependency they suggest to restart the algorithm several times and then select the result with the smallest number of Gaussians. Note that this compromises their computational advantages over GPs. While computing a GP mapping is cubic in the number of data points, merging the Gaussians is quadratic in the number of time points, but if you consider that different merge orders need to be checked, you’ll notice that there are 2 to the power of time points possible merge sequences, meaning that your computational costs can increase exponentially in the worst case when really the best solution is supposed to be found (if you optimise the hyperparameters in GPs you’re in a similar situation in a continuous space, though).

Encoding of Motor Skill in the Corticomuscular System of Musicians.

Gentner, R., Gorges, S., Weise, D., aufm Kampe, K., Buttmann, M., and Classen, J.
Current Biology, 20:1869-1874
 , 2010
DOI, Google Scholar


Summary How motor skills are stored in the nervous system represents a fundamental question in neuroscience. Although musical motor skills are associated with a variety of adaptations [[1], [2] and [3]], it remains unclear how these changes are linked to the known superior motor performance of expert musicians. Here we establish a direct and specific relationship between the functional organization of the corticomuscular system and skilled musical performance. Principal component analysis was used to identify joint correlation patterns in finger movements evoked by transcranial magnetic stimulation over the primary motor cortex while subjects were at rest. Linear combinations of a selected subset of these patterns were used to reconstruct active instrumental playing or grasping movements. Reconstruction quality of instrumental playing was superior in skilled musicians compared to musically untrained subjects, displayed taxonomic specificity for the trained movement repertoire, and correlated with the cumulated long-term training exposure, but not with the recent past training history. In violinists, the reconstruction quality of grasping movements correlated negatively with the long-term training history of violin playing. Our results indicate that experience-dependent motor skills are specifically encoded in the functional organization of the primary motor cortex and its efferent system and are consistent with a model of skill coding by a modular neuronal architecture [4].


The authors use PCA on TMS induced postures to show that motor cortex represents building blocks of movements which adapt to everyday requirements. To be precise, the authors recorded finger movements which were induced by TMS over primary motor cortex and extracted for each of the different stimulations the posture which had the largest deviation from rest. From the resulting set of postures they computed the first 4 principal components (PCs) and looked how well a linear combination of the PCs could reconstruct postures recorded during normal behaviour of the subjects. This is made more interesting by comparing groups of subjects with different motor experience. They use highly trained violinists and pianists and a group of non-musicians and then compare the different combinations of who is used for determining PCs and what is trying to be reconstructed (violin playing, piano playing, or grasping where grasping can be that of violinists or non-musicians). Basis of comparison is a correlation (R) between the series of joint angle vectors as defined in Shadmehr1994 which can be interpreted as something like the average correlation between data points of the two sequences measured across joint angles (cf. normalised inner product matrix in GPLVM). Don’t ask me why they take exactly this measure, but probably it doesn’t matter. The main finding is that the PCs from violinists are significantly better in reconstructing violin playing than either the piano PCs, or the non-musician PCs. This table is missing in the text (but the data is there, showing mean R and its standard deviation):

R violinists pianists non-musicians

violin 0.69+0.09 0.63+0.11 0.64+0.09

piano 0.70+0.06 0.74+0.06 0.70+0.07

grasp 0.76+0.09 0.76+0.09 0.76+0.10

what is not discussed in the paper is that pianists’ PCs are worse in reconstructing violin playing than PCs of non-musicians. An interesting finding is that the years of intensive training of violinists correlates significantly with the reconstruction quality for violin playing of violinist PCs while it is anticorrelated with the reconstruction quality for grasping indicating that the postures activated in primary motor cortex become more adapted to frequently executed tasks. However, it has to be noted that this correlation analysis is based on only 9 data points.

In the beginning of the paper they show an analysis of the recorded behaviour which simply is supposed to ensure that violin playing, piano playing and grasping movements are sufficiently different which we may believe, although piano playing and grasping apparently are somewhat similar.

Efficient Reductions for Imitation Learning.

Ross, S. and Bagnell, D.
in: JMLR W&CP 9: AISTATS 2010, pp. 661–668, 2010
Google Scholar


Imitation Learning, while applied successfully on many large real-world problems, is typically addressed as a standard supervised learning problem, where it is assumed the training and testing data are i.i.d.. This is not true in imitation learning as the learned policy influences the future test inputs (states) upon which it will be tested. We show that this leads to compounding errors and a regret bound that grows quadratically in the time horizon of the task. We propose two alternative algorithms for imitation learning where training occurs over several episodes of interaction. These two approaches share in common that the learner’s policy is slowly modified from executing the expert’s policy to the learned policy. We show that this leads to stronger performance guarantees and demonstrate the improved performance on two challenging problems: training a learner to play 1) a 3D racing game (Super Tux Kart) and 2) Mario Bros.; given input images from the games and corresponding actions taken by a human expert and near-optimal planner respectively.


The authors note that previous approaches of learning a policy from an example policy are limited in the sense that they only see successful examples generated from the desired policy and, therefore, will exhibit a larger error than expected from supervised learning of independent samples, because an error can propagate through the series of decisions, if the policy hasn’t learnt to recover to the desired policy when an error occurred. They then show that a lower error can be expected when a Forward Algorithm is used for training which learns a non-stationary policy successively for each time step. The idea probably being (I’m not too sure) that the data at the time step that is currently learnt contains the errors (that lead to different states) you would usually expect from the learnt policies, because for every time step new data is sampled based on the already learnt policies. They transfer this idea to learning of a stationary policy and propose SMILe (stochastic mixing iterative learning). In this algorithm the stationary policy is a linear combination of policies learnt in previous iterations where the initial policy is the desired one. The influence of the desired policy decreases exponentially with the number of iterations, but also the weights of policies learnt later decrease exponentially, but stay fixed in subsequent iterations, i.e. the policies learnt first will have the largest weights eventually. This makes sense, because they will most probably be closest to the desired policy (seeing mostly samples produced from the desired policy).

The aim is to make the learnt policy more robust without using too many samples from the desired policy. I really wonder whether you could achieve exactly the same performance by simply additionally sampling the desired policy from randomly perturbed states and adding these as training points to learning of a single policy. Depending on how expensive your learning algorithm is this may be much faster in total (as you only have to learn once on a larger data set). Of course, you then may not have the theoretical guarantees provided in the paper. Another drawback of the approach presented in the paper is that it needs to be possible to sample from the desired policy interactively during the learning. I can’t imagine a scenario where this is practical (a human in the loop?).

I was interested in this, because in an extended abstract to a workshop (see attached files) the authors referred to this approach and also mentioned Langford2009 as a similar learning approach based on local updates. Also you can see the policy as a differential equation, i.e. the results of the paper may also apply to learning of dynamical systems without control inputs. The problems are certainly very similar.

They use a neural network to learn policies in the particular application they consider.

Generating coherent patterns of activity from chaotic neural networks.

Sussillo, D. and Abbott, L. F.
Neuron, 63:544–557, 2009
DOI, Google Scholar


Neural circuits display complex activity patterns both spontaneously and when responding to a stimulus or generating a motor output. How are these two forms of activity related? We develop a procedure called FORCE learning for modifying synaptic strengths either external to or within a model neural network to change chaotic spontaneous activity into a wide variety of desired activity patterns. FORCE learning works even though the networks we train are spontaneously chaotic and we leave feedback loops intact and unclamped during learning. Using this approach, we construct networks that produce a wide variety of complex output patterns, input-output transformations that require memory, multiple outputs that can be switched by control inputs, and motor patterns matching human motion capture data. Our results reproduce data on premovement activity in motor and premotor cortex, and suggest that synaptic plasticity may be a more rapid and powerful modulator of network activity than generally appreciated.


The authors present a new way of reservoir computing. The setup apparently (haven’t read the paper) is very similar to the echo state networks of Herbert Jaeger (Jaeger and Haas, Science, 2004); the difference being the signal that is fed back to the reservoir from the output. While Jaeger fed back the target value f(t), they feed back the error between f(t) and the prediction given the current weights and reservoir activity. Key to their approach then is that they use a weight update rule which almost instantaneously provides weights that minimise the error. While this obviously leads to a very high variability of the weights across time steps at the start of learning, they argue that this variability diminishes during learning and weights eventually stabilise such that, when learning is switched off, the target dynamics is reproduced. They present a workaround which may make it possible to also learn non-periodic functions, but it’s clearly better suited for periodic functions.

I wonder how the learning is divided between feedback mechanism and weight adaptation (network model of Fig. 1A). In particular, it could well be that the feedback mechanism is solely responsible for successfull learning while the weights just settle to a more or less arbitrary setting once the dynamics is stabilised through the feedback (making weights uninterpretable). The authors also report how the synapses within the reservoir can be adapted to reproduce the target dynamics when no feedback signal is given from the network output (structure in Fig. 1C). Curiously, the credit assignment problem is solved by ignoring it: for the adaptation of reservoir synapses the same network level output error is used as for the adaptation of output weights.

It’s interesting that it works, but to know why and how it works would be good. The main argument of the authors why their proposal is better than echo state networks is that their proposal is more stable. They present corresponding results in Fig. 4, but they never tell us what they mean by stable. So how stable are the dynamics learnt by FORCE? How much can you perturb the network dynamics before it stops being able to reproduce the target dynamics. In other words, how far off the desired dynamics can you initialise the network state?

They have an interesting principal components analysis of network activity suggesting that the dynamics converges to the same values for the first principal components for different starting states, but I haven’t understood it well enough during this first read to comment further on that.

Modeling discrete and rhythmic movements through motor primitives: a review.

Degallier, S. and Ijspeert, A.
Biol Cybern, 103:319–338, 2010
DOI, Google Scholar


Rhythmic and discrete movements are frequently considered separately in motor control, probably because different techniques are commonly used to study and model them. Yet the increasing interest in finding a comprehensive model for movement generation requires bridging the different perspectives arising from the study of those two types of movements. In this article, we consider discrete and rhythmic movements within the framework of motor primitives, i.e., of modular generation of movements. In this way we hope to gain an insight into the functional relationships between discrete and rhythmic movements and thus into a suitable representation for both of them. Within this framework we can define four possible categories of modeling for discrete and rhythmic movements depending on the required command signals and on the spinal processes involved in the generation of the movements. These categories are first discussed in terms of biological concepts such as force fields and central pattern generators and then illustrated by several mathematical models based on dynamical system theory. A discussion on the plausibility of theses models concludes the work.


In the first part, the paper reviews experimental evidence for the existence of a motor primitive system located on the level of the spinal cord. In particular, the discussion is centred on the existence of central pattern generators and force fields (also: muscle synergies) defined in the spinal cord. Results showing the independence of these from cortical signals exist for animals up to the cat, or so. “In humans, the activity of the isolated spinal cord is not observable, […]: influences from higher cortical areas and from sensory pathways can hardly be excluded.”

The remainder of the article reviews dynamical systems that have been proposed as models for movement primitives. The models are roughly characterised according to the assumptions about the relationships between discrete and rhythmic movements. The authors define 4 categories: two/two, one/two, one/one and two/one, where a two means separate systems for discrete and rhythmic movements, a one means a common system, the number before the slash corresponds to the planning process (signals potentially generated as motor commands from cortex) and the number behind the slash corresponds to the execution system where the movement primitives are defined.

You would think that the aim of this excercise is to work out advantages and disadvantages of the models, but the authors mainly restrict themselves to describing the models. The main conclusion then is that discrete and rhythmic movements can be generated from movement primitives in the spinal cord while cortex may only provide simple, non-patterned commands. The proposed categorisation may help to discern models experimentally, but apparently there is currently no conclusive evidence favouring any of the categories (authors repeatedly cite two conflicting studies).