Flexible vowel recognition by the generation of dynamic coherence in oscillator neural networks: speaker-independent vowel recognition.

Liu, F., Yamaguchi, Y., and Shimizu, H.
Biol Cybern, 71:105–114, 1994
DOI, Google Scholar

Abstract

We propose a new model for speaker-independent vowel recognition which uses the flexibility of the dynamic linking that results from the synchronization of oscillating neural units. The system consists of an input layer and three neural layers, which are referred to as the A-, B- and C-centers. The input signals are a time series of linear prediction (LPC) spectrum envelopes of auditory signals. At each time-window within the series, the A-center receives input signals and extracts local peaks of the spectrum envelope, i.e., formants, and encodes them into local groups of independent oscillations. Speaker-independent vowel characteristics are embedded as a connection matrix in the B-center according to statistical data of Japanese vowels. The associative interaction in the B-center and reciprocal interaction between the A- and B-centers selectively activate a vowel as a global synchronized pattern over two centers. The C-center evaluates the synchronized activities among the three formant regions to give the selective output of the category among the five Japanese vowels. Thus, a flexible ability of dynamical linking among features is achieved over the three centers. The capability in the present system was investigated for speaker-independent recognition of Japanese vowels. The system demonstrated a remarkable ability for the recognition of vowels very similar to that of human listeners, including misleading vowels. In addition, it showed stable recognition for unsteady input signals and robustness against background noise. The optimum condition of the frequency of oscillation is discussed in comparison with stimulus-dependent synchronizations observed in neurophysiological experiments of the cortex.

Review

The authors present an oscillating recurrent neural network model for the recognition of Japanese vowels. The model consists of 4 layers: 1) an input layer which gives pre-processed frequency information, 2) an oscillatory hidden layer with local inhibition, 3) another oscillatory hidden layer with long-range inhibition and 4) a readout layer implementing the classification of vowels using a winner-takes-all mechanism. Layers 1-3 each contain 32 units where each unit is associated to one input frequency. The output layer contains one unit for each of the 5 vowels and the readout mechanism is based on multiplication of weighted sums of layer 3 activities such that the output is also oscillatory. The oscillatory units in layers 2 and 3 consist of an excitatory element coupled with an inhibitory element which oscillate, or become silent, depending on the input. The long-range connections in layer 3 are determined manually based on known correlations between formants (characteristic frequencies) of the individual vowels.

In experiments the authors show that the classification of their network is robust against different speakers (14 men, 5 women, 5 girls, 5 boys): 6 out of 145 trials were correctly classified. However, they do not report what exactly their criterion for classification performance was (remember that the output was oscillatory, also sometimes alternative vowels show bumps in the time course of a vowel in the shown examples). They also report robustness to imperfect stimuli (formants varying within a vowel) and noise (superposition of 12 different conversations), but only single examples are shown.

Without being able to tell what the state of the art in neural networks in 1994 was, I guess the main contribution of the paper is that it shows that vowel recognition may be robustly implemented using oscillatory networks. At least from today’s perspective the suggested network is a bad solution to the technical problem of vowel recogntion, but even alternative algorithms at the time were probably better in that (there’s a hint in one of the paragraphs in the discussion). The paper is a good example for what was wrong with neural network research at the time: the models give the feeling that they are pretty arbitrary. Are the units in the network only defined and connected like they are, because these were the parameters that worked? Most probably. At least here the connectivity is partly determined through some knowledge of how frequencies produced by vowels relate, but many other parameters appear to be chosen arbitrarily. Respect to the person who made it work. However, the results section is rather weak. They only tested one example of a spoken vowel per person and they don’t define classification performance clearly. I guess, you could argue that it is a proof-of-concept of a possible biological implementation, but then again it is still unclear how this can be properly related to real networks in the brain.

Spike-Based Population Coding and Working Memory.

Boerlin, M. and Denève, S.
PLoS Comput Biol, 7:e1001080, 2011
DOI, Google Scholar

Abstract

Abstract

Compelling behavioral evidence suggests that humans can make optimal decisions despite the uncertainty inherent in perceptual or motor tasks. A key question in neuroscience is how populations of spiking neurons can implement such probabilistic computations. In this article, we develop a comprehensive framework for optimal, spike-based sensory integration and working memory in a dynamic environment. We propose that probability distributions are inferred spike-per-spike in recurrently connected networks of integrate-and-fire neurons. As a result, these networks can combine sensory cues optimally, track the state of a time-varying stimulus and memorize accumulated evidence over periods much longer than the time constant of single neurons. Importantly, we propose that population responses and persistent working memory states represent entire probability distributions and not only single stimulus values. These memories are reflected by sustained, asynchronous patterns of activity which make relevant information available to downstream neurons within their short time window of integration. Model neurons act as predictive encoders, only firing spikes which account for new information that has not yet been signaled. Thus, spike times signal deterministically a prediction error, contrary to rate codes in which spike times are considered to be random samples of an underlying firing rate. As a consequence of this coding scheme, a multitude of spike patterns can reliably encode the same information. This results in weakly correlated, Poisson-like spike trains that are sensitive to initial conditions but robust to even high levels of external neural noise. This spike train variability reproduces the one observed in cortical sensory spike trains, but cannot be equated to noise. On the contrary, it is a consequence of optimal spike-based inference. In contrast, we show that rate-based models perform poorly when implemented with stochastically spiking neurons.

Author Summary

Most of our daily actions are subject to uncertainty. Behavioral studies have confirmed that humans handle this uncertainty in a statistically optimal manner. A key question then is what neural mechanisms underlie this optimality, i.e. how can neurons represent and compute with probability distributions. Previous approaches have proposed that probabilities are encoded in the firing rates of neural populations. However, such rate codes appear poorly suited to understand perception in a constantly changing environment. In particular, it is unclear how probabilistic computations could be implemented by biologically plausible spiking neurons. Here, we propose a network of spiking neurons that can optimally combine uncertain information from different sensory modalities and keep this information available for a long time. This implies that neural memories not only represent the most likely value of a stimulus but rather a whole probability distribution over it. Furthermore, our model suggests that each spike conveys new, essential information. Consequently, the observed variability of neural responses cannot simply be understood as noise but rather as a necessary consequence of optimal sensory integration. Our results therefore question strongly held beliefs about the nature of neural “signal” and “noise”.

Review

[note: I here often write posterior, but mean log-posterior as this is what the authors mostly compute with]

Boerlin and Deneve present a recurrent spiking neural network which integrates dynamically changing stimuli from different modalities, allows for simple readout of the complete posterior distribution, predicts state dynamics and, therefore, may act as a working memory when a stimulus is absent. Interestingly, spikes in the recurrent neural network (RNN) are generated deterministically, but from an outside perspective interspike intervals of individual neurons appear to follow a Poisson distribution as measured experimentally. How is all this achieved and what are the limitations?

The experimental setup is as follows: There is a ONE-dimensional, noisy, dynamic variable in the world (state from here on) which we want to track through time. However, observations are only made through noisy spike trains from different sensory modalities where the conditional probability of a spike given a particular state is modelled as a Poisson distribution (actually exponential family distr. but in the experiments they use a Poisson). The RNN receives these spikes as input and the question then is how we have to setup the dynamics of each neuron in the RNN such that a simple integrator can readout the posterior distribution of the state from RNN activities.

The main trick of the paper is to find an approximation of the true (log-)posterior L which in turn may be approximated using the readout posterior G under the assumption that the two are good approximations of each other. You recognise the circularity in this statement. This is resolved by using a spiking mechanism which ensures that the two are indeed close to each other which in turn ensures that the true posterior L is approximated. The rest is deriving formulae and substituting them in each other until you get a formula describing the (dynamics of the) membrane potential of a single neuron in the RNN which only depends on sensory and RNN spikes, the tuning curves or gains of the associated neurons, rate constants of the network (called leaks here) and (true) parameters of the state dynamics.

The approximations used for the (log-)posterior are a Taylor expansion of 2nd order, a subsequent Taylor expansion of 1st order and a discretisation of the posterior according to the preferred state of each RNN neuron. However, the most critical assumption for the derivation of the results is that the dynamics is 1st order Markovian and linear. In particular, they assume a state dynamics which has a constant drift and a Wiener process diffusion. In the last paragraph of the discussion they mention that it is straightforward to extend the model to state dependent drift, but I don’t follow how this could be done, because their derivation of L crucially depends on the observation that p(x_t|x_{t-dt}) = p(x_t – x_{t-dt}) which is only true for state-independent drift.

The resulting membrane potential has a form corresponding to a leaky integrate and fire neuron. The authors differentiate between 4 parts: a leakage current, feed-forward input from sensory neurons (containing a bias term which, I think, is wrong in Materials and Methods but which is also not used in the experiments), instantaneous recurrent input from the RNN and slow recurrent currents from the RNN which are responsible for keeping up a memory of the approximated posterior past the time constant of the neuron. The slow currents are defined by two separate differential equations and I wonder where these are implemented in the neuron, if it already has a membrane potential associated with it to which the slow currents contribute. Also interesting to note is that all terms except for the leakage current are modulated by the RNN spike gains (Gamma) defining which effect a spike of neuron i has on the readout of the approximate posterior at the preferred state of neuron j. This includes the feed-forward input and means that feed-forward connection weights are determined by a linear combination of posterior gains (Gamma) and gains defined by the conditional probability of sensory spikes given the state (H). This means that the feed-forward weights are tuned to also take the effect of an input spike on the readout into account?

Anyway, the resulting spiking mechanism makes neurons spike whenever they improve the readout of the posterior from the RNN. The authors interpret this as a prediction error signal: a spike indicates that the posterior represented by the RNN deviated from the true (approximated) posterior. I guess we can call this prediction, because the readout/posterior has dynamics. But note that it is hard to interpret individual spikes with respect to prediction errors of the input spike train (something not desired anyway?). Also, the authors show that this representation is highly redundant. There always exist alternative spike trains of the RNN which represent the same posterior. This results in the demonstrated robustness and apparent randomness of the coding scheme. However, it also makes it impossible to interpret what it means when a neuron is silent. Nevertheless, neurons still exhibit characteristic tuning curves on average.

Notice that they do not assume a distributional form of the posterior and indeed they show that the network can represent a bimodal posterior, too.

In summary, the work at hand impressively combines many important aspects of recognising dynamic stimuli in a spike-based framework. Probably the most surprising property of the suggested neural network is that it produces spikes deterministically in order to optimise a global criterion although with a local spiking rule. However, the authors have to make important assumptions to arrive at these results. In particular, they need constant drift dynamics for their derivations, but also the “local” spiking rule turns out to use some global information: the weights of input and recurrently connected neurons in the membrane potential dynamics of an RNN neuron are determined from the gains for the readout of every neuron in the network, i.e., each neuron needs to know what a spike of each other neuron contributes to the posterior. I wonder what a corresponding learning rule would look like. Additionally, they need to assume that the RNN is fully connected, i.e., that every neuron, which contributes to the posterior, sends messages (spikes) to all other neurons contributing to the posterior. The authors also do not explain how the suggested slow, recurrent currents are represented in a spiking neuron. After all, these currents seem to have dynamics independent from the membrane potential of the neuron, yet they implement the dynamics of the posterior and are, therefore, absolutely central for predicting the development of the posterior over time. Finally, we have to keep in mind that the population of neurons coded for a discretisation of the posterior of a one-dimensional variable. With increasing dimensionality you’ll therefore have to spend an exponentially increasing number of neurons to represent the posterior and all of them will have to be connected.

Recurrent excitation in neocortical circuits.

Douglas, R. J., Koch, C., Mahowald, M., Martin, K. A., and Suarez, H. H.
Science, 269:981–985, 1995
DOI, Google Scholar

Abstract

The majority of synapses in the mammalian cortex originate from cortical neurons. Indeed, the largest input to cortical cells comes from neighboring excitatory cells. However, most models of cortical development and processing do not reflect the anatomy and physiology of feedback excitation and are restricted to serial feedforward excitation. This report describes how populations of neurons in cat visual cortex can use excitatory feedback, characterized as an effective “network conductance”, to amplify their feedforward input signals and demonstrates how neuronal discharge can be kept proportional to stimulus strength despite strong, recurrent connections that threaten to cause runaway excitation. These principles are incorporated into models of cortical direction and orientation selectivity that emphasize the basic design principles of cortical architectures.

Review

The paper suggests that the functional role of recurrent excitatory connections is to amplify (increase gain between inputs and outputs) and denoise inputs to a (sensory) cortical area. This would allow these input signals to be relatively small and would, therefore, help to save energy (they don’t make this argument explicitly).

The work is motivated by an estimate of the number of recurrent connections directly made between spiny stellate cells of layer IV in the cat visual cortex. The authors conclude that these connections alone can already “provide a significant source of recurrent excitation”.

First, they consider an electronic circuit analogy describing the feed-forward input and recurrent currents acting on a neuron in the network. They look at the influence of the recurrent conductance (can be seen as the connectivity strength between all recurrently connected neurons) on the stability of the network and suggest that inhibitory neurons keep the network stable when the recurrent conductance is too high and would alone lead to divergence of network activities. However, they also implemented a model recurrent network consisting of excitatory and inhibitory spiking neurons and showed that it can implement direction selectivity of V1 simple cells. Interestingly, direction selectivity is based on asymmetric firing of excitatory and inhibitory connections from LGN (“in the preferred direction excitation precedes inhibition”) which they support with two references.

I find it hard to believe that cortical recurrent networks apparently don’t do any computations on their own except for improving the incoming signal. It means that all computations are actually done in the feed forward connections between areas. The excitation-inhibition asynchrony being an example here. But then, if you assume a hierarchy of similar processing units, where does, e.g., the necessary excitation-inhibition asynchrony come from? Well, potentially there are readout-neurons outside of the recurrently connected network which do exactly that. Then again, the whole processing in the brain would be feed-forward where the only intrinsically dynamic units would just amplify the feed-forward signals. Reservoir computing could be seen as an extension to this where the dynamics of the recurrent neurons is allowed to be more sophisticated, but becomes uninterpretable in turn. Still, the presented model is consistent, as far as I can tell, with the idea that the activity in response to a stimulus represents the posterior while activity at rest represents the prior over the variables represented by the network under consideration.

Note that the authors do not have any direct experimental evidence for their model in terms of simultaneous recordings of neurons in the same network. They only compare two summary statistics based on individual cells, for the second of which I don’t understand the experiment.

Recurrent neuronal circuits in the neocortex.

Douglas, R. J. and Martin, K. A. C.
Curr Biol, 17:R496–R500, 2007
DOI, Google Scholar

Abstract

In this Primer, we shall describe one interesting property of neocortical circuits – recurrent connectivity – and suggest what its computational significance might be.

Review

First, they use data of the distribution of synapses in cat visual cortex to argue that the predominant drive of activity in a cortical area is from recurrent connections within this area. They then suggest that the reason for this is the ability to enhance and denoise incoming signals through suitable recurrent connections. They show corresponding functional behaviour in a model based on linear threshold neurons (LTNs). They do not use sigmoid activation functions, because neurons apparently only rarely operate on their maximum firing rate such that sigmoid activation functions are not necessary. To maintain stability they instead use a global inhibitory unit. I guess you could equivalently use a suitable sigmoid function. Finally they suggest that top-down connections may bias the activity in the recurrent network such that one of a few alternative inputs may be selected based on, e.g., attention.

So here the functional role of the recurrent neural network is merely to increase the signal to noise ratio. It’s a bit strange to me that actually no computation is done. Does that mean that all the computation from sensory signals to hidden states are done by the projections from lower level area to higher level area? This seems to be consistent with the reservoir computing idea where the reservoir can also be seen as enhancing the representation of the input (by stretching its effects in time). The difference just being that the dynamics and function in reservoirs is more involved.

The ideas presented here are almost the same as already proposed by the first author in 1995 (see Douglas1995).

Spatiotemporal representations in the olfactory system.

Schaefer, A. T. and Margrie, T. W.
Trends Neurosci, 30:92–100, 2007
DOI, Google Scholar

Abstract

A complete understanding of the mechanisms underlying any kind of sensory, motor or cognitive task requires analysis from the systems to the cellular level. In olfaction, new behavioural evidence in rodents has provided temporal limits on neural processing times that correspond to less than 150ms–the timescale of a single sniff. Recent in vivo data from the olfactory bulb indicate that, within each sniff, odour representation is not only spatially organized, but also temporally structured by odour-specific patterns of onset latencies. Thus, we propose that the spatial representation of odour is not a static one, but rather evolves across a sniff, whereby for difficult discriminations of similar odours, it is necessary for the olfactory system to “wait” for later-activated components. Based on such evidence, we have devised a working model to assess further the relevance of such spatiotemporal processes in odour representation.

Review

They review evidence for temporal coding of odours in the olfactory bulb (and olfactory receptor neurons). Main finding is that with increasing intensity of an odour corresponding neurons fire more action potentials in a given time. However, this is achieved by an earlier onset of firing while inter-spike intervals stay roughly equal. The authors argue that this is a fast temporal code that can be used to discriminate odours. Especially, they suggest that this can explain why very different odours can be discriminated faster. The assumption there is that these differ mainly in high-intensity, i.e., fast subodours while similar odours differ mainly in low-intensity, i.e., slow subodours. But can it not be that similar odours differ only slightly in high-intensity subodours? My intuition says that the decision boundary is more determined by considerations of uncertainty rather than a temporal code of high- and low-intensity.

The authors ignore that there is an increased amount of action potentials for high-intensity odours and rely in their arguments entirely on the temporal aspect of earlier firing. If only the temporal code was important, this would be a huge energy waste by the brain. Stefan suggested that it might be related to subsequent checks and to cumulating evidence.

Temporal sparseness of the premotor drive is important for rapid learning in a neural network model of birdsong.

Fiete, I. R., Hahnloser, R. H. R., Fee, M. S., and Seung, H. S.
J Neurophysiol, 92:2274–2282, 2004
DOI, Google Scholar

Abstract

Sparse neural codes have been widely observed in cortical sensory and motor areas. A striking example of sparse temporal coding is in the song-related premotor area high vocal center (HVC) of songbirds: The motor neurons innervating avian vocal muscles are driven by premotor nucleus robustus archistriatalis (RA), which is in turn driven by nucleus HVC. Recent experiments reveal that RA-projecting HVC neurons fire just one burst per song motif. However, the function of this remarkable temporal sparseness has remained unclear. Because birdsong is a clear example of a learned complex motor behavior, we explore in a neural network model with the help of numerical and analytical techniques the possible role of sparse premotor neural codes in song-related motor learning. In numerical simulations with nonlinear neurons, as HVC activity is made progressively less sparse, the minimum learning time increases significantly. Heuristically, this slowdown arises from increasing interference in the weight updates for different synapses. If activity in HVC is sparse, synaptic interference is reduced, and is minimized if each synapse from HVC to RA is used only once in the motif, which is the situation observed experimentally. Our numerical results are corroborated by a theoretical analysis of learning in linear networks, for which we derive a relationship between sparse activity, synaptic interference, and learning time. If songbirds acquire their songs under significant pressure to learn quickly, this study predicts that HVC activity, currently measured only in adults, should also be sparse during the sensorimotor phase in the juvenile bird. We discuss the relevance of these results, linking sparse codes and learning speed, to other multilayered sensory and motor systems.

Review

They model the generation of bird song as a simple feed-forward network and show that a sparse temporal code of HVC neurons (feeding into RA neurons) speeds up learning with backpropagation. They argue that this speed up is the main explanation for why real HVC neurons exhibit a sparse temporal code.

HVC neurons are modelled as either on or off, i.e., bursting or non-bursting, while RA neurons have continuous activities. A linear combination of RA neurons then determines the output of the network. They define a desired, low-pass filtered output that should be learnt, but while their Fig. 2 suggests that they model the sequential aspect of the data, the actual network has no such component and the temporal order of the data points is irrelevant for learning. Maybe fixing, i.e., not learning, the connections from RA to output is biologically well motivated, but other choices for the network seem to be quite arbitrary, e.g., why do RA neurons project from the beginning to only one of two outputs? They varied quite a few parameters and found that their main result (learning is faster with sparse HVC firing) holds for all of them, though. Interesting to note: they had to initialise HVC-RA and RA thresholds such that initial RA activity is low and nonuniform in order to get desired type of RA activity after learning.

I didn’t like the paper that much, because they showed the benefit of sparse coding for the biologically implausible backpropagation learning. Would it also hold up against a Hebbian learning paradigm? On the other hand, the whole idea of being able to learn better when each neuron is only responsible for one restricted part of the stimulus is so outrageously intuitive that you wonder why this needed to be shown in the first place (Stefan noted, though, that he doesn’t know of work investigating temporal sparseness compared to spatial sparseness)? Finally, you cannot argue that this is the main reason why HVC neurons fire in a temporally sparse manner, because there might be other unknown reasons and this is only a side effect.

SORN: a self-organizing recurrent neural network.

Lazar, A., Pipa, G., and Triesch, J.
Front Comput Neurosci, 3:23, 2009
DOI, Google Scholar

Abstract

Understanding the dynamics of recurrent neural networks is crucial for explaining how the brain processes information. In the neocortex, a range of different plasticity mechanisms are shaping recurrent networks into effective information processing circuits that learn appropriate representations for time-varying sensory stimuli. However, it has been difficult to mimic these abilities in artificial neural network models. Here we introduce SORN, a self-organizing recurrent network. It combines three distinct forms of local plasticity to learn spatio-temporal patterns in its input while maintaining its dynamics in a healthy regime suitable for learning. The SORN learns to encode information in the form of trajectories through its high-dimensional state space reminiscent of recent biological findings on cortical coding. All three forms of plasticity are shown to be essential for the network’s success.

Review

The paper considers the question of whether adapting an RNN used as a reservoir gives better performance in a sequence prediction task than randomly initialised RNNs. The authors demonstrate an adaptation procedure based on spike-timing-dependent plasticity (STDP) controlled with intrinsic plasticity (IP) and synaptic normalisation (SN) as homeostatic mechanisms and show that the performance of the adapted RNNs is indeed superior to the performance of the random RNNs. They further show that IP and SN are necessary for good results, or rather that without either the RNN exhibits disadvantageous firing patterns (bursting, always on, always off).

This is one of the few studies which shows successfull learning of RNNs. However, they use a rather simple model: a binary network in discrete time. The connectivity of the network is more elaborate: there are excitatory units which are recurrently connected, as well as fewer inhibitory neurons which have no connections between themselves, but are fully and reciprocally connected with all excitatory units. Input to the network is given to excitatory units through input units which are separated into subsets which each give a spike (1) when a specific symbol in the input sequence is currently present (input sequences consist of letters and numbers). The authors show that the RNN develops states (activity of all units in the network as a vector) which are specific to individual input symbols with the addition that also the serial number of the input symbol in the sequence is represented. This simplifies readout of the current symbol in the sequence from RNN activity and hence leads to improved performance of predicting the next symbol in the sequence using a standard reservoir computing readout function. However, the authors note that the RNN keeps on changing its response to input, i.e., their learning rule does not converge which means that the readout function would have to be updated all the time as well. Consequently, they switch off learning in the test phase.

The authors show that it is beneficial that recurrent connections between excitatory units are sparse.

An embodied account of serial order: How instabilities drive sequence generation.

Sandamirskaya, Y. and Schöner, G.
Neural Networks, 23:1164–1179, 2010
DOI, Google Scholar

Abstract

Learning and generating serially ordered sequences of actions is a core component of cognition both in organisms and in artificial cognitive systems. When these systems are embodied and situated in partially unknown environments, specific constraints arise for any neural mechanism of sequence generation. In particular, sequential action must resist fluctuating sensory information and be capable of generating sequences in which the individual actions may vary unpredictably in duration. We provide a solution to this problem within the framework of Dynamic Field Theory by proposing an architecture in which dynamic neural networks create stable states at each stage of a sequence. These neural attractors are destabilized in a cascade of bifurcations triggered by a neural representation of a condition of satisfaction for each action. We implement the architecture on a robotic vehicle in a color search task, demonstrating both sequence learning and sequence generation on the basis of low-level sensory information.

Review

The paper presents a dynamical model of the execution of sequential actions driven by sensory feedback which allows variable duration of individual actions as signalled by external cues of subtask fulfillment (i.e. end of action). Therefore, it is one of the first functioning models with continuous dynamics which truly integrates action and perception. The core technique used is dynamic field theory (DFT) which implements winner-take-all dynamics in the continuous domain, i.e. the basic dynamics stays at a uniform baseline until a sufficiently large input at a certain position drives activity over a threshold and produces a stable single peak of activity around there. The different components of the model all run with dynamics using the same principle and are suitably connected such that stable peaks in activity can be destabilised to allow moving the peak to a new position (signalling something different).

The aim of the excercise is to show that varying length sequential actions can be produced by a model of continuous neuronal population dynamics. Sequential structure is induced in the model by a set of ordinal nodes which are coupled via additional memory nodes such that they are active one after the after. However, the switch to the next ordinal node in the sequence needs to be triggered by sensory input which indicates that the aim of an action has been achieved. Activity of an ordinal node then directly induces a peak in the action field at a location determined by a set of learnt weights. In the robot example the action space is defined over the hue value, i.e. each action selects a certain colour. The actual action of the robot (turning and accelerating) is controlled by an additional color-space field and some motor dynamics not part of the sequence model. Hence, their sequence model as such only prescribes discrete actions. To decide whether an action has been successfully completed the action field increases activity in a particular spot in a condition of satisfaction field which only peaks at that spot, if suitable sensory input drives the activity at the spot over the threshold. Which spot the action field selects is determined by hand here (in the example it’s an identity function). A peak in the condition of satisfaction field then triggers a switch to the next ordinal node in the sequence. We don’t really see an evaluation of system performance (by what criterion?), but their system seems to work ok, at least producing the sequences in the order demonstrated during learning.

The paper is quite close to what we are envisaging. The free energy principle could add a Bayesian perspective (we would have to find a way to implement the conditional progression of a sequence, but I don’t see a reason why this shouldn’t be possible). Apart from that the function implemented by the dynamics is extremely simple. In fact, the whole sequential system could be replaced with simple, discrete if-then logic without having to change the continuous dynamics of the robot implementation layer (color-space field and motor dynamics). I don’t see how continuous dynamics here helps except that it is more biologically plausible. This is also a point on which the authors focus in the introduction and discussion. Something else that I noticed: all dynamic variables are only 1D (except for the colour-space field which is 2D). This is probably because the DFT formalism requires that the activity over the field is integrated for each position in the field every simulation step to compute the changes in activity (cf. computation of expectations in Bayesian inference) which is probably infeasible when the representations contain several variables.

Cortical Preparatory Activity: Representation of Movement or First Cog in a Dynamical Machine?

Churchland, M. M., Cunningham, J. P., Kaufman, M. T., Ryu, S. I., and Shenoy, K. V.
Neuron, 68:387 – 400, 2010
DOI, Google Scholar

Abstract

Summary The motor cortices are active during both movement and movement preparation. A common assumption is that preparatory activity constitutes a subthreshold form of movement activity: a neuron active during rightward movements becomes modestly active during preparation of a rightward movement. We asked whether this pattern of activity is, in fact, observed. We found that it was not: at the level of a single neuron, preparatory tuning was weakly correlated with movement-period tuning. Yet, somewhat paradoxically, preparatory tuning could be captured by a preferred direction in an abstract #space# that described the population-level pattern of movement activity. In fact, this relationship accounted for preparatory responses better than did traditional tuning models. These results are expected if preparatory activity provides the initial state of a dynamical system whose evolution produces movement activity. Our results thus suggest that preparatory activity may not represent specific factors, and may instead play a more mechanistic role.

Review

What are the variables that best explain the preparatory tuning of neurons in dorsal premotor and primary motor cortex of monkeys doing a reaching task? This is the core question of the paper which is motivated by the observation of the authors that preparatory and perimovement (ie. within movement) activity of a single neuron may even qualitatively differ considerably (something conflicting with the view that preparatory activity is a subthreshold version of perimovment activity). This observation is experimentally underlined in the paper by showing that average preparatory activity and average perimovement activity of a single neuron are largely uncorrelated for different experimental conditions.

To quantify the suitability of a set of variables to explain perparatory activity of a neuron the authors use a linear regression approach in which the values of these variables for a given experimental condition are used to predict the firing rate of the neuron in that condition. The authors compute the generalisation error of the learnt linear model with crossvalidation and compare the performance of several sets of variables based on this error. The variables performing best are the principal component scores of the perimovement population activity of all recorded neurons. The difference to alternative sets of variables is significant and in particular the wide range of considered variables makes the result convincing (e.g. target position, initial velocity, endpoints and maximum speed, but also principal component scores of EMG activity and kinematic variables, i.e. position, speed and acceleration of the hand). That perimovement activity is the best regressor for preparatory activity is quite odd, or as Burak aptly put it: “They are predicting the past.”

The authors suggest a dynamical systems view as explanation for their results and hypthesise that preparatory activity sets the initial state of the dynamical system constituted by the population of neurons. In this view, the preparatory activity of a single neuron is not sufficient to predict its evolution of activity (note that the correlation between perparatory and perimovement activity assesses only one particular way of predicting perimovement from preparatory activity – scaling), but the evolution of activity of all neurons can be used to determine the preparatory activity of a single neuron under the assumption that the evolution of activity is governed by approximately linear dynamics. If the dynamics is linear, then any state in the future is a linear transformation of the initial state and given enough data points from the future the initial state can be determined by an appropriate linear inversion. The additional PCA, also a linear transformation, doesn’t change that, but makes the regression easier and, important for the noisy data, also regularises.

These findings and suggestions are all quite interesting and certainly fit into our preconceptions about neuronal activity, but are the presented results really surprising? Do people still believe that you can make sense of the activity of isolated neurons in cortex, or isn’t it already accepted that population dynamics is necessary to characterise neuronal responses? For example, Pillow et al. (Pillow2008) used coupled spiking models to successfully predict spike trains directly from stimuli in retinal ganglion cells. On the other hand, Churchland et al. indirectly claim in this paper that the population dynamics is (approximately) linear, which is certainly disputable, but what would nonlinear dynamics mean for their analysis?

Efficient Reductions for Imitation Learning.

Ross, S. and Bagnell, D.
in: JMLR W&CP 9: AISTATS 2010, pp. 661–668, 2010
Google Scholar

Abstract

Imitation Learning, while applied successfully on many large real-world problems, is typically addressed as a standard supervised learning problem, where it is assumed the training and testing data are i.i.d.. This is not true in imitation learning as the learned policy influences the future test inputs (states) upon which it will be tested. We show that this leads to compounding errors and a regret bound that grows quadratically in the time horizon of the task. We propose two alternative algorithms for imitation learning where training occurs over several episodes of interaction. These two approaches share in common that the learner’s policy is slowly modified from executing the expert’s policy to the learned policy. We show that this leads to stronger performance guarantees and demonstrate the improved performance on two challenging problems: training a learner to play 1) a 3D racing game (Super Tux Kart) and 2) Mario Bros.; given input images from the games and corresponding actions taken by a human expert and near-optimal planner respectively.

Review

The authors note that previous approaches of learning a policy from an example policy are limited in the sense that they only see successful examples generated from the desired policy and, therefore, will exhibit a larger error than expected from supervised learning of independent samples, because an error can propagate through the series of decisions, if the policy hasn’t learnt to recover to the desired policy when an error occurred. They then show that a lower error can be expected when a Forward Algorithm is used for training which learns a non-stationary policy successively for each time step. The idea probably being (I’m not too sure) that the data at the time step that is currently learnt contains the errors (that lead to different states) you would usually expect from the learnt policies, because for every time step new data is sampled based on the already learnt policies. They transfer this idea to learning of a stationary policy and propose SMILe (stochastic mixing iterative learning). In this algorithm the stationary policy is a linear combination of policies learnt in previous iterations where the initial policy is the desired one. The influence of the desired policy decreases exponentially with the number of iterations, but also the weights of policies learnt later decrease exponentially, but stay fixed in subsequent iterations, i.e. the policies learnt first will have the largest weights eventually. This makes sense, because they will most probably be closest to the desired policy (seeing mostly samples produced from the desired policy).

The aim is to make the learnt policy more robust without using too many samples from the desired policy. I really wonder whether you could achieve exactly the same performance by simply additionally sampling the desired policy from randomly perturbed states and adding these as training points to learning of a single policy. Depending on how expensive your learning algorithm is this may be much faster in total (as you only have to learn once on a larger data set). Of course, you then may not have the theoretical guarantees provided in the paper. Another drawback of the approach presented in the paper is that it needs to be possible to sample from the desired policy interactively during the learning. I can’t imagine a scenario where this is practical (a human in the loop?).

I was interested in this, because in an extended abstract to a workshop (see attached files) the authors referred to this approach and also mentioned Langford2009 as a similar learning approach based on local updates. Also you can see the policy as a differential equation, i.e. the results of the paper may also apply to learning of dynamical systems without control inputs. The problems are certainly very similar.

They use a neural network to learn policies in the particular application they consider.