Spike-Based Population Coding and Working Memory.

Boerlin, M. and Denève, S.
PLoS Comput Biol, 7:e1001080, 2011
DOI, Google Scholar

Abstract

Abstract

Compelling behavioral evidence suggests that humans can make optimal decisions despite the uncertainty inherent in perceptual or motor tasks. A key question in neuroscience is how populations of spiking neurons can implement such probabilistic computations. In this article, we develop a comprehensive framework for optimal, spike-based sensory integration and working memory in a dynamic environment. We propose that probability distributions are inferred spike-per-spike in recurrently connected networks of integrate-and-fire neurons. As a result, these networks can combine sensory cues optimally, track the state of a time-varying stimulus and memorize accumulated evidence over periods much longer than the time constant of single neurons. Importantly, we propose that population responses and persistent working memory states represent entire probability distributions and not only single stimulus values. These memories are reflected by sustained, asynchronous patterns of activity which make relevant information available to downstream neurons within their short time window of integration. Model neurons act as predictive encoders, only firing spikes which account for new information that has not yet been signaled. Thus, spike times signal deterministically a prediction error, contrary to rate codes in which spike times are considered to be random samples of an underlying firing rate. As a consequence of this coding scheme, a multitude of spike patterns can reliably encode the same information. This results in weakly correlated, Poisson-like spike trains that are sensitive to initial conditions but robust to even high levels of external neural noise. This spike train variability reproduces the one observed in cortical sensory spike trains, but cannot be equated to noise. On the contrary, it is a consequence of optimal spike-based inference. In contrast, we show that rate-based models perform poorly when implemented with stochastically spiking neurons.

Author Summary

Most of our daily actions are subject to uncertainty. Behavioral studies have confirmed that humans handle this uncertainty in a statistically optimal manner. A key question then is what neural mechanisms underlie this optimality, i.e. how can neurons represent and compute with probability distributions. Previous approaches have proposed that probabilities are encoded in the firing rates of neural populations. However, such rate codes appear poorly suited to understand perception in a constantly changing environment. In particular, it is unclear how probabilistic computations could be implemented by biologically plausible spiking neurons. Here, we propose a network of spiking neurons that can optimally combine uncertain information from different sensory modalities and keep this information available for a long time. This implies that neural memories not only represent the most likely value of a stimulus but rather a whole probability distribution over it. Furthermore, our model suggests that each spike conveys new, essential information. Consequently, the observed variability of neural responses cannot simply be understood as noise but rather as a necessary consequence of optimal sensory integration. Our results therefore question strongly held beliefs about the nature of neural “signal” and “noise”.

Review

[note: I here often write posterior, but mean log-posterior as this is what the authors mostly compute with]

Boerlin and Deneve present a recurrent spiking neural network which integrates dynamically changing stimuli from different modalities, allows for simple readout of the complete posterior distribution, predicts state dynamics and, therefore, may act as a working memory when a stimulus is absent. Interestingly, spikes in the recurrent neural network (RNN) are generated deterministically, but from an outside perspective interspike intervals of individual neurons appear to follow a Poisson distribution as measured experimentally. How is all this achieved and what are the limitations?

The experimental setup is as follows: There is a ONE-dimensional, noisy, dynamic variable in the world (state from here on) which we want to track through time. However, observations are only made through noisy spike trains from different sensory modalities where the conditional probability of a spike given a particular state is modelled as a Poisson distribution (actually exponential family distr. but in the experiments they use a Poisson). The RNN receives these spikes as input and the question then is how we have to setup the dynamics of each neuron in the RNN such that a simple integrator can readout the posterior distribution of the state from RNN activities.

The main trick of the paper is to find an approximation of the true (log-)posterior L which in turn may be approximated using the readout posterior G under the assumption that the two are good approximations of each other. You recognise the circularity in this statement. This is resolved by using a spiking mechanism which ensures that the two are indeed close to each other which in turn ensures that the true posterior L is approximated. The rest is deriving formulae and substituting them in each other until you get a formula describing the (dynamics of the) membrane potential of a single neuron in the RNN which only depends on sensory and RNN spikes, the tuning curves or gains of the associated neurons, rate constants of the network (called leaks here) and (true) parameters of the state dynamics.

The approximations used for the (log-)posterior are a Taylor expansion of 2nd order, a subsequent Taylor expansion of 1st order and a discretisation of the posterior according to the preferred state of each RNN neuron. However, the most critical assumption for the derivation of the results is that the dynamics is 1st order Markovian and linear. In particular, they assume a state dynamics which has a constant drift and a Wiener process diffusion. In the last paragraph of the discussion they mention that it is straightforward to extend the model to state dependent drift, but I don’t follow how this could be done, because their derivation of L crucially depends on the observation that p(x_t|x_{t-dt}) = p(x_t – x_{t-dt}) which is only true for state-independent drift.

The resulting membrane potential has a form corresponding to a leaky integrate and fire neuron. The authors differentiate between 4 parts: a leakage current, feed-forward input from sensory neurons (containing a bias term which, I think, is wrong in Materials and Methods but which is also not used in the experiments), instantaneous recurrent input from the RNN and slow recurrent currents from the RNN which are responsible for keeping up a memory of the approximated posterior past the time constant of the neuron. The slow currents are defined by two separate differential equations and I wonder where these are implemented in the neuron, if it already has a membrane potential associated with it to which the slow currents contribute. Also interesting to note is that all terms except for the leakage current are modulated by the RNN spike gains (Gamma) defining which effect a spike of neuron i has on the readout of the approximate posterior at the preferred state of neuron j. This includes the feed-forward input and means that feed-forward connection weights are determined by a linear combination of posterior gains (Gamma) and gains defined by the conditional probability of sensory spikes given the state (H). This means that the feed-forward weights are tuned to also take the effect of an input spike on the readout into account?

Anyway, the resulting spiking mechanism makes neurons spike whenever they improve the readout of the posterior from the RNN. The authors interpret this as a prediction error signal: a spike indicates that the posterior represented by the RNN deviated from the true (approximated) posterior. I guess we can call this prediction, because the readout/posterior has dynamics. But note that it is hard to interpret individual spikes with respect to prediction errors of the input spike train (something not desired anyway?). Also, the authors show that this representation is highly redundant. There always exist alternative spike trains of the RNN which represent the same posterior. This results in the demonstrated robustness and apparent randomness of the coding scheme. However, it also makes it impossible to interpret what it means when a neuron is silent. Nevertheless, neurons still exhibit characteristic tuning curves on average.

Notice that they do not assume a distributional form of the posterior and indeed they show that the network can represent a bimodal posterior, too.

In summary, the work at hand impressively combines many important aspects of recognising dynamic stimuli in a spike-based framework. Probably the most surprising property of the suggested neural network is that it produces spikes deterministically in order to optimise a global criterion although with a local spiking rule. However, the authors have to make important assumptions to arrive at these results. In particular, they need constant drift dynamics for their derivations, but also the “local” spiking rule turns out to use some global information: the weights of input and recurrently connected neurons in the membrane potential dynamics of an RNN neuron are determined from the gains for the readout of every neuron in the network, i.e., each neuron needs to know what a spike of each other neuron contributes to the posterior. I wonder what a corresponding learning rule would look like. Additionally, they need to assume that the RNN is fully connected, i.e., that every neuron, which contributes to the posterior, sends messages (spikes) to all other neurons contributing to the posterior. The authors also do not explain how the suggested slow, recurrent currents are represented in a spiking neuron. After all, these currents seem to have dynamics independent from the membrane potential of the neuron, yet they implement the dynamics of the posterior and are, therefore, absolutely central for predicting the development of the posterior over time. Finally, we have to keep in mind that the population of neurons coded for a discretisation of the posterior of a one-dimensional variable. With increasing dimensionality you’ll therefore have to spend an exponentially increasing number of neurons to represent the posterior and all of them will have to be connected.

Temporal sparseness of the premotor drive is important for rapid learning in a neural network model of birdsong.

Fiete, I. R., Hahnloser, R. H. R., Fee, M. S., and Seung, H. S.
J Neurophysiol, 92:2274–2282, 2004
DOI, Google Scholar

Abstract

Sparse neural codes have been widely observed in cortical sensory and motor areas. A striking example of sparse temporal coding is in the song-related premotor area high vocal center (HVC) of songbirds: The motor neurons innervating avian vocal muscles are driven by premotor nucleus robustus archistriatalis (RA), which is in turn driven by nucleus HVC. Recent experiments reveal that RA-projecting HVC neurons fire just one burst per song motif. However, the function of this remarkable temporal sparseness has remained unclear. Because birdsong is a clear example of a learned complex motor behavior, we explore in a neural network model with the help of numerical and analytical techniques the possible role of sparse premotor neural codes in song-related motor learning. In numerical simulations with nonlinear neurons, as HVC activity is made progressively less sparse, the minimum learning time increases significantly. Heuristically, this slowdown arises from increasing interference in the weight updates for different synapses. If activity in HVC is sparse, synaptic interference is reduced, and is minimized if each synapse from HVC to RA is used only once in the motif, which is the situation observed experimentally. Our numerical results are corroborated by a theoretical analysis of learning in linear networks, for which we derive a relationship between sparse activity, synaptic interference, and learning time. If songbirds acquire their songs under significant pressure to learn quickly, this study predicts that HVC activity, currently measured only in adults, should also be sparse during the sensorimotor phase in the juvenile bird. We discuss the relevance of these results, linking sparse codes and learning speed, to other multilayered sensory and motor systems.

Review

They model the generation of bird song as a simple feed-forward network and show that a sparse temporal code of HVC neurons (feeding into RA neurons) speeds up learning with backpropagation. They argue that this speed up is the main explanation for why real HVC neurons exhibit a sparse temporal code.

HVC neurons are modelled as either on or off, i.e., bursting or non-bursting, while RA neurons have continuous activities. A linear combination of RA neurons then determines the output of the network. They define a desired, low-pass filtered output that should be learnt, but while their Fig. 2 suggests that they model the sequential aspect of the data, the actual network has no such component and the temporal order of the data points is irrelevant for learning. Maybe fixing, i.e., not learning, the connections from RA to output is biologically well motivated, but other choices for the network seem to be quite arbitrary, e.g., why do RA neurons project from the beginning to only one of two outputs? They varied quite a few parameters and found that their main result (learning is faster with sparse HVC firing) holds for all of them, though. Interesting to note: they had to initialise HVC-RA and RA thresholds such that initial RA activity is low and nonuniform in order to get desired type of RA activity after learning.

I didn’t like the paper that much, because they showed the benefit of sparse coding for the biologically implausible backpropagation learning. Would it also hold up against a Hebbian learning paradigm? On the other hand, the whole idea of being able to learn better when each neuron is only responsible for one restricted part of the stimulus is so outrageously intuitive that you wonder why this needed to be shown in the first place (Stefan noted, though, that he doesn’t know of work investigating temporal sparseness compared to spatial sparseness)? Finally, you cannot argue that this is the main reason why HVC neurons fire in a temporally sparse manner, because there might be other unknown reasons and this is only a side effect.

An embodied account of serial order: How instabilities drive sequence generation.

Sandamirskaya, Y. and Schöner, G.
Neural Networks, 23:1164–1179, 2010
DOI, Google Scholar

Abstract

Learning and generating serially ordered sequences of actions is a core component of cognition both in organisms and in artificial cognitive systems. When these systems are embodied and situated in partially unknown environments, specific constraints arise for any neural mechanism of sequence generation. In particular, sequential action must resist fluctuating sensory information and be capable of generating sequences in which the individual actions may vary unpredictably in duration. We provide a solution to this problem within the framework of Dynamic Field Theory by proposing an architecture in which dynamic neural networks create stable states at each stage of a sequence. These neural attractors are destabilized in a cascade of bifurcations triggered by a neural representation of a condition of satisfaction for each action. We implement the architecture on a robotic vehicle in a color search task, demonstrating both sequence learning and sequence generation on the basis of low-level sensory information.

Review

The paper presents a dynamical model of the execution of sequential actions driven by sensory feedback which allows variable duration of individual actions as signalled by external cues of subtask fulfillment (i.e. end of action). Therefore, it is one of the first functioning models with continuous dynamics which truly integrates action and perception. The core technique used is dynamic field theory (DFT) which implements winner-take-all dynamics in the continuous domain, i.e. the basic dynamics stays at a uniform baseline until a sufficiently large input at a certain position drives activity over a threshold and produces a stable single peak of activity around there. The different components of the model all run with dynamics using the same principle and are suitably connected such that stable peaks in activity can be destabilised to allow moving the peak to a new position (signalling something different).

The aim of the excercise is to show that varying length sequential actions can be produced by a model of continuous neuronal population dynamics. Sequential structure is induced in the model by a set of ordinal nodes which are coupled via additional memory nodes such that they are active one after the after. However, the switch to the next ordinal node in the sequence needs to be triggered by sensory input which indicates that the aim of an action has been achieved. Activity of an ordinal node then directly induces a peak in the action field at a location determined by a set of learnt weights. In the robot example the action space is defined over the hue value, i.e. each action selects a certain colour. The actual action of the robot (turning and accelerating) is controlled by an additional color-space field and some motor dynamics not part of the sequence model. Hence, their sequence model as such only prescribes discrete actions. To decide whether an action has been successfully completed the action field increases activity in a particular spot in a condition of satisfaction field which only peaks at that spot, if suitable sensory input drives the activity at the spot over the threshold. Which spot the action field selects is determined by hand here (in the example it’s an identity function). A peak in the condition of satisfaction field then triggers a switch to the next ordinal node in the sequence. We don’t really see an evaluation of system performance (by what criterion?), but their system seems to work ok, at least producing the sequences in the order demonstrated during learning.

The paper is quite close to what we are envisaging. The free energy principle could add a Bayesian perspective (we would have to find a way to implement the conditional progression of a sequence, but I don’t see a reason why this shouldn’t be possible). Apart from that the function implemented by the dynamics is extremely simple. In fact, the whole sequential system could be replaced with simple, discrete if-then logic without having to change the continuous dynamics of the robot implementation layer (color-space field and motor dynamics). I don’t see how continuous dynamics here helps except that it is more biologically plausible. This is also a point on which the authors focus in the introduction and discussion. Something else that I noticed: all dynamic variables are only 1D (except for the colour-space field which is 2D). This is probably because the DFT formalism requires that the activity over the field is integrated for each position in the field every simulation step to compute the changes in activity (cf. computation of expectations in Bayesian inference) which is probably infeasible when the representations contain several variables.

BM: An iterative algorithm to learn stable non-linear dynamical systems with Gaussian mixture models.

Khansari-Zadeh, S. M. and Billard, A.
in: Proc. IEEE Int Robotics and Automation (ICRA) Conf, pp. 2381–2388, 2010
DOI, Google Scholar

Abstract

We model the dynamics of non-linear point-topoint robot motions as a time-independent system described by an autonomous dynamical system (DS). We propose an iterative algorithm to estimate the form of the DS through a mixture of Gaussian distributions. We prove that the resulting model is asymptotically stable at the target. We validate the accuracy of the model on a library of 2D human motions and to learn a control policy through human demonstrations for two multidegrees of freedom robots. We show the real-time adaptation to perturbations of the learned model when controlling the two kinematically-driven robots.

Review

The authors describe a system for learning nonlinear, multivariate dynamical systems based on Gaussian mixture regression (GMR). The difference to previous approaches using GMR (e.g. Gribovskaya2010) is that the GMR is done by pruning a Gaussian mixture model which has a Gaussian at each time point such that accuracy and stability criteria are adhered to. Pruning here actually means that two neighbouring Gaussians are merged. Consequently, the main contribution from the paper is the derivation and proof of the corresponding stability criteria – something that I haven’t checked properly.

They make a quantitative comparison between their binary merging approach, original EM learning of GMR, using LWPR to learn the dynamics and using DMPs. However, they do not tell the precise procedures. I am particular surprised about the very low accuracy of the DMPs compared to the other approaches. Unless they have done something special (such as introduce large temporal deviations as done for Fig. 2) I don’t see why the accuracy for DMPs should be so low.

They argue that the main advantages of their approach are that a minimal number of used Gaussians is automatically determined while the resulting dynamics is stable at all times, that the multivariate Gaussians can capture correlations between dimensions (in contrast to DMPs) and that the computations are less costly than when using Gaussian Process Regression. The disadvantages are that the number of parameters increases quadratically with the dimensionality (curse of dimensionality, not so crucial for their 2, 4 or 6D examples, but then?), but, in particular, that the pruning procedure is highly susceptible to local minima issues and results depend on the order in which Gaussians are merged. In the extreme case, imagine that through the presence of noise none of the initial Gaussians can be merged without violating the accuracy constraint. Again, this might not be a problem for their very smooth data, but it will become problematic for more noisy data. Similar problems lead to the dependency on the order of merges (which are selected randomly). To overcome the order dependency they suggest to restart the algorithm several times and then select the result with the smallest number of Gaussians. Note that this compromises their computational advantages over GPs. While computing a GP mapping is cubic in the number of data points, merging the Gaussians is quadratic in the number of time points, but if you consider that different merge orders need to be checked, you’ll notice that there are 2 to the power of time points possible merge sequences, meaning that your computational costs can increase exponentially in the worst case when really the best solution is supposed to be found (if you optimise the hyperparameters in GPs you’re in a similar situation in a continuous space, though).

Generating coherent patterns of activity from chaotic neural networks.

Sussillo, D. and Abbott, L. F.
Neuron, 63:544–557, 2009
DOI, Google Scholar

Abstract

Neural circuits display complex activity patterns both spontaneously and when responding to a stimulus or generating a motor output. How are these two forms of activity related? We develop a procedure called FORCE learning for modifying synaptic strengths either external to or within a model neural network to change chaotic spontaneous activity into a wide variety of desired activity patterns. FORCE learning works even though the networks we train are spontaneously chaotic and we leave feedback loops intact and unclamped during learning. Using this approach, we construct networks that produce a wide variety of complex output patterns, input-output transformations that require memory, multiple outputs that can be switched by control inputs, and motor patterns matching human motion capture data. Our results reproduce data on premovement activity in motor and premotor cortex, and suggest that synaptic plasticity may be a more rapid and powerful modulator of network activity than generally appreciated.

Review

The authors present a new way of reservoir computing. The setup apparently (haven’t read the paper) is very similar to the echo state networks of Herbert Jaeger (Jaeger and Haas, Science, 2004); the difference being the signal that is fed back to the reservoir from the output. While Jaeger fed back the target value f(t), they feed back the error between f(t) and the prediction given the current weights and reservoir activity. Key to their approach then is that they use a weight update rule which almost instantaneously provides weights that minimise the error. While this obviously leads to a very high variability of the weights across time steps at the start of learning, they argue that this variability diminishes during learning and weights eventually stabilise such that, when learning is switched off, the target dynamics is reproduced. They present a workaround which may make it possible to also learn non-periodic functions, but it’s clearly better suited for periodic functions.

I wonder how the learning is divided between feedback mechanism and weight adaptation (network model of Fig. 1A). In particular, it could well be that the feedback mechanism is solely responsible for successfull learning while the weights just settle to a more or less arbitrary setting once the dynamics is stabilised through the feedback (making weights uninterpretable). The authors also report how the synapses within the reservoir can be adapted to reproduce the target dynamics when no feedback signal is given from the network output (structure in Fig. 1C). Curiously, the credit assignment problem is solved by ignoring it: for the adaptation of reservoir synapses the same network level output error is used as for the adaptation of output weights.

It’s interesting that it works, but to know why and how it works would be good. The main argument of the authors why their proposal is better than echo state networks is that their proposal is more stable. They present corresponding results in Fig. 4, but they never tell us what they mean by stable. So how stable are the dynamics learnt by FORCE? How much can you perturb the network dynamics before it stops being able to reproduce the target dynamics. In other words, how far off the desired dynamics can you initialise the network state?

They have an interesting principal components analysis of network activity suggesting that the dynamics converges to the same values for the first principal components for different starting states, but I haven’t understood it well enough during this first read to comment further on that.

Modeling discrete and rhythmic movements through motor primitives: a review.

Degallier, S. and Ijspeert, A.
Biol Cybern, 103:319–338, 2010
DOI, Google Scholar

Abstract

Rhythmic and discrete movements are frequently considered separately in motor control, probably because different techniques are commonly used to study and model them. Yet the increasing interest in finding a comprehensive model for movement generation requires bridging the different perspectives arising from the study of those two types of movements. In this article, we consider discrete and rhythmic movements within the framework of motor primitives, i.e., of modular generation of movements. In this way we hope to gain an insight into the functional relationships between discrete and rhythmic movements and thus into a suitable representation for both of them. Within this framework we can define four possible categories of modeling for discrete and rhythmic movements depending on the required command signals and on the spinal processes involved in the generation of the movements. These categories are first discussed in terms of biological concepts such as force fields and central pattern generators and then illustrated by several mathematical models based on dynamical system theory. A discussion on the plausibility of theses models concludes the work.

Review

In the first part, the paper reviews experimental evidence for the existence of a motor primitive system located on the level of the spinal cord. In particular, the discussion is centred on the existence of central pattern generators and force fields (also: muscle synergies) defined in the spinal cord. Results showing the independence of these from cortical signals exist for animals up to the cat, or so. “In humans, the activity of the isolated spinal cord is not observable, […]: influences from higher cortical areas and from sensory pathways can hardly be excluded.”

The remainder of the article reviews dynamical systems that have been proposed as models for movement primitives. The models are roughly characterised according to the assumptions about the relationships between discrete and rhythmic movements. The authors define 4 categories: two/two, one/two, one/one and two/one, where a two means separate systems for discrete and rhythmic movements, a one means a common system, the number before the slash corresponds to the planning process (signals potentially generated as motor commands from cortex) and the number behind the slash corresponds to the execution system where the movement primitives are defined.

You would think that the aim of this excercise is to work out advantages and disadvantages of the models, but the authors mainly restrict themselves to describing the models. The main conclusion then is that discrete and rhythmic movements can be generated from movement primitives in the spinal cord while cortex may only provide simple, non-patterned commands. The proposed categorisation may help to discern models experimentally, but apparently there is currently no conclusive evidence favouring any of the categories (authors repeatedly cite two conflicting studies).

Winnerless competition between sensory neurons generates chaos: A possible mechanism for molluscan hunting behavior.

Varona, P., Rabinovich, M. I., Selverston, A. I., and Arshavsky, Y. I.
Chaos: An Interdisciplinary Journal of Nonlinear Science, 12:672–677, 2002
DOI, Google Scholar

Abstract

In the presence of prey, the marine mollusk Clione limacina exhibits search behavior, i.e., circular motions whose plane and radius change in a chaotic-like manner. We have formulated a dynamical model of the chaotic hunting behavior of Clione based on physiological in vivo and in vitro experiments. The model includes a description of the action of the cerebral hunting interneuron on the receptor neurons of the gravity sensory organ, the statocyst. A network of six receptor model neurons with Lotka-Volterra-type dynamics and nonsymmetric inhibitory interactions has no simple static attractors that correspond to winner take all phenomena. Instead, the winnerless competition induced by the hunting neuron displays hyperchaos with two positive Lyapunov exponents. The origin of the chaos is related to the interaction of two clusters of receptor neurons that are described with two heteroclinic loops in phase space. We hypothesize that the chaotic activity of the receptor neurons can drive the complex behavior of Clione observed during hunting.

Review

see Levi2005 for short summary in context

My biggest concern with this paper is that the changes in direction of the mollusc may also result from feedback from the body and especially the stratocysts during its accelerated swimming. The question is, are these direction changes a result of chaotic, but deterministic dynamics in the sensory network as suggested by the model, or are they a result of essentially random processes which may be influenced by feedback from other networks? The authors note that in their model “The neurons keep the sequence of activation but the interval in which they are active is continuously changing in time”. After a day of search for papers which have investigated the swimming behaviour of Clione limacina (the mollusc in question) I came to the conclusion that the data schown in Fig. 1 likely is the only data set of swimming behaviour that was published. This small data set suggests random changes in direction, in contrast to the model, but it does not allow to draw any definite conclusions about the repetitiveness of direction changes.