## Normative evidence accumulation in unpredictable environments.

Glaze, C. M., Kable, J. W., and Gold, J. I.
Elife, 4, 2015

### Abstract

In our dynamic world, decisions about noisy stimuli can require temporal accumulation of evidence to identify steady signals; differentiation to detect unpredictable changes in those signals; or both. Normative models can account for learning in these environments but have not yet been applied to faster decision processes. We present a novel, normative formulation of adaptive learning models that forms decisions by acting as a leaky accumulator with non-absorbing bounds. These dynamics, derived for both discrete and continuous cases, depend on the expected rate of change of the statistics of the evidence and balance signal identification and change detection. We found that, for two different tasks, human subjects learned these expectations, albeit imperfectly, then used them to make decisions in accordance with the normative model. The results represent a unified, empirically supported account of decision-making in unpredictable environments that provides new insights into the expectation-driven dynamics of the underlying neural signals.

### Review

The authors suggest a model of sequential information processing that is aware of possible switches in the underlying source of information. They further show that the model fits responses of people in two perceptual decision making tasks and consequently argue that behaviour, which was previously considered to be suboptimal, may follow the normative, i.e., optimal, mechanism of the model. This mechanism postulates that typical evidence accumulation mechanisms in perceptual decision making are altered by the expected switch rate of the stimulus. Specifically, evidence accumulation becomes more leaky and a non-absorbing bound becomes lower when the expected switch rate increases. The paper is generally well-written (although there are some convoluted bits in the results section) and convincing. I was a bit surprised, though, that only choices, but not their timing is considered in the analysis with the model. In the following I’ll go through some more details of the model and discuss limitations of the presented models and their relation to other models in the field, but first I describe the experiments reported in the paper.

The paper reports two experiments. In the first (triangles task) people saw two triangles on the screen and had to judge whether a single dot was more likely to originate from the one triangle or the other. There was one dot and corresponding response per trial. In each trial the position of the dot was redrawn from a Gaussian distribution centred around one of the two triangles. There were also change point trials in which the triangle from which the dot was drawn switched (and then remained the same until the next change point). The authors analysed the proportion correct in relation to whether a trial was a change point. Trials were grouped into blocks which were defined by constant rate of switches (hazard rate) in the true originating triangle. In the second experiment (dots-reversal task), a random dot stimulus repeatedly switched (reversed) direction within a trial. In each trial people had to tell in which direction the dots moved before they vanished. The authors analysed the proportion correct in relation to the time between the last switch and the end of stimulus presentation. There were no blocks. Each trial had one of two hazard rates and one of two difficulty levels. The two difficulty levels were determined for each subject individually such that the more difficult one lead to correct identification of motion direction of a 500ms long stimulus in 65% of cases.

The authors present two normative models, one discrete and one continuous, which they apply across and within trial in the triangles and dots-reversal tasks, respectively. The discrete model is a simple hidden Markov model in which the hidden state can take one of two values and there is a common transition probability between these two values which they call hazard ‘rate’ (H). Observations were implicitly assumed Gaussian. They only enter during fitting as log-likelihood ratios in the form $$\beta*x_n$$ where beta is a scaling relating to the internal / sensory uncertainty associated with the generative model of observations and $$x_n$$ is the observed dot position (x-coordinate) in the triangles task. In methods, the authors derive the update equation for the log posterior odds ($$L_n$$) of the hidden state values given in Eqs. (1) and (2).

The continuous model is based on a Markov jump process with two states which is the continuous equivalent of the hidden Markov model above. Using Ito-calculus the authors again derive an update equation for the log posterior odds of the two states (Eq. 4), but during fitting they actually approximate Eq. (4) with the discrete Eq. (1), because it is supposedly the most efficient discrete-time approximation of Eq. (4) (no explanation for why this is the case was given). They just replace the log-likelihood ratio placeholder (LLR) with a coherence-dependent term applicable to the random dot motion stimulus. Notably, in contrast to standard drift-diffusion modelling of random dot motion tasks, the authors used coherence-dependent noise. I’d be interested in the reason for this choice.

There is an apparent fundamental difference between the discrete and continuous models which can be seen in Fig. 1 B vs C. In the discrete model, for H>0.5, the log posterior odds may actually switch sign from one observation to the next whereas this cannot happen in the continuous model. Conceptually, this means that the log posterior odds in the discrete model, when the LLR is 0, i.e., when there is no evidence in either direction, would oscillate between decreasing positive and increasing negative values until converging to 0. This oscillation can be seen in Fig. 2G, red line for |LLR|>0. In the continuous model such an oscillation cannot happen, because the infinitely many, tiny time steps allow the model to converge to 0 before switching the sign. Another way to see this is through the discrete hazard ‘rate’ H which is the probability of a sign reversal within one time step of size dt. When you want to decrease dt in the model, but want to maintain a given rate of sign reversals in, e.g., 1 second, H would also have to decrease. Consequently, when dt approaches 0, the probability of a sign reversal approaches 0, too, which means that H is a useless parameter in continuous time which, in turn, is the reason why it is replaced by a real rate parameter ($$\lambda$$) representing the expected number of reversals per second. In conclusion, the fundamental difference between discrete and continuous models is only an apparent one. They are very similar models, just expressed in different resolutions of time. In that sense it would have perhaps been better to present results in the paper consistently in terms of a real hazard rate ($$\lambda$$) which could be obtained in the triangles task by dividing H by the average duration of a trial in seconds. Notice that the discrete model represents all hazard rates $$\lambda>1/dt$$ as H=1, i.e., it cannot represent hazard rates which would lead to more than 1 expected sign reversal per $$dt$$. There may be more subtle differences between the models when the exact distributions of sign reversals are considered instead of only the expected rates.

Using first order approximations of the two models the authors identify two components in the dynamics of the log posterior odds L: a leak and a bias. [Side remark: there is a small sign mistake in the definition of leak k of the continuous model in the Methods section.] Both depend on hazard rate and the authors show that the leak dominates the dynamics for small L whereas the bias dominates for large L. I find this denomination a bit misleading, because both, leak and bias, effectively result in a leak of log-posterior odds L by reducing L in every time step (cf. Fig. 1B,C). The change from a multiplicative leak to one based on a bias just means that the effective amount of leak in L increases nonlinearly with L as the bias takes over.

To test whether this special form of leak underlies decision making the authors compared the full model to two versions which only had a multiplicative leak, or one based on bias. In the former the leak stayed constant for increasing L, i.e., $$L’ = \gamma*L$$. In the latter there was perfect accumulation without leak up to the bias and then a bias-based leak which corresponds to a multiplicative leak where the leak rate increased with L such that $$L’ = \gamma(L)*L$$ with $$\gamma(L) = bias / L$$. The authors report evidence that in both tasks both alternative models do not describe choice behaviour as well as the full, normative model. In Fig. 9 they provide a reason by estimating the effective leak rate in the data and the models in dependence on the strength of sensory evidence (coherence in the dots reversal task). They do this by fitting the model with multiplicative leak separately to trials with low and high coherence (fitting to choices in the data or predicted by the different fitted models). In both data and normative model the effective leak rates depended on coherence. This dependence arises, because high sensory evidence leads to large values of L and I have argued above that larger L has larger effective leak rate due to the bias. It is, therefore, not surprising that the alternative model with multiplicative leak shows no dependence of effective leak on coherence. But it is also not surprising that the alternative model with bias-based leak has a larger dependence of effective leak on coherence than the data, because this model jumps from no leak to very large leak when coherence jumps from low to high. The full, normative model lies in between, because it smoothly transitions between the two alternative models.

Why is there a leak in the first place? Other people have found no evidence for a leak in evidence accumulation (eg. Brunton et al., 2013). The leak results from the possibility of a switch of the source of the observations, i.e., a switch of the underlying true stimulus. Without any information, i.e., without observations the possibility of a switch means that you should become more uncertain about the stimulus as time passes. The larger the hazard rate, i.e., the larger the probability of a switch within some time window, the faster you should become uncertain about the current stimulus. For a log posterior odds of L=0 uncertainty is at its maximum (both stimuli have equal posterior probability). This is another reason why discrete hazard ‘rates’ H>0.5 which lead to sign reversals in L do not make much sense. The absence of evidence for one stimulus should not lead to evidence for the other stimulus. Anyway, as the hazard rate goes to 0 the leak will go to 0 such that in experiments where usually no switches in stimulus occur subjects should not exhibit a leak which explains why we often find no evidence for leaks in typical perceptual decision making experiments. This does not mean that there is no leak, though. Especially, the authors report here that hazard rates estimated from behaviour of subjects (subjective) tended to be a bit higher than the ones used to generate the stimuli (objective), when the objective hazard rates were very low and the other way around for high objective hazard rates. This indicates that people have some prior expectations towards intermediate hazard rates that biased their estimates of hazard rates in the experiment.

The discussed forms of leak implement a property of the model that the authors called a ‘non-absorbing bound’. I find this wording also a bit misleading, because ‘bound’ was usually used to indicate a threshold in drift diffusion models which, when reached, would trigger a response. The bound here triggers nothing. Rather, it represents an asymptote of the average log posterior odds. Thus, it’s not an absolute bound, but it’s often passed due to variance in the momentary sensory evidence (LLR). I can also not follow the authors when they write: “The stabilizing boundary is also in contrast to the asymptote in leaky accumulation, which increases linearly with the strength of evidence”. Based on the dynamics of L discussed above the ‘bound’ here should exhibit exactly the described behaviour of an asymptote in leaky accumulation. The strength of evidence is reflected in the magnitude of LLR which is added to the intrinsic dynamics of the log posterior odds L. The non-absorbing bound, therefore, should be given by bias + average of LLR for the current stimulus. The bound, thus, should rise linearly with the strength of evidence (LLR).

Fitting of the discrete and continuous models was done by maximising the likelihood of the models (in some fits with many parameters, priors over parameters were used to regularise the optimisation). The likelihood in the discrete models was Gaussian with mean equal to the log posterior odds ($$L_n$$) computed from the actual dot positions $$x_n$$. The variance of the Gaussian likelihood was fitted to the data as a free parameter. In the continuous model the likelihood was numerically approximated by simulating the discretised evolution of the probabilities that the log posterior odds take on particular values. This is very similar to the approach used by Brunton2013. The distribution of the log posterior odds $$L_n$$ was considered here, because the stream of sensory observations $$x(t)$$ was unknown and therefore had to enter as a random variable while in the triangles task $$x(t)=x_n$$ was set to the known x-coordinates of the presented dots.

The authors argued that the fits of behaviour were good, but at least for the dots reversal task Fig. 8 suggests otherwise. For example, Fig. 8G shows that 6 out of 12 subjects (there were supposed to be 13, but I can only see 12 in the plots) made 100% errors in trials with the low hazard rate of 0.1Hz and low coherence where the last switch in stimulus was very recent (maximally 300ms before the end of stimulus presentation). The best fitting model, however, predicted error rates of at most 90% in these conditions. Furthermore, there is a significant difference in choice errors between the low and high hazard rate for large times after the last switch in stimulus (Fig. 8A, more errors for high hazard rate) which was not predicted by the fitted normative model. Despite these differences the fitted normative model seems to capture the overall patterns in the data.

#### Conclusion

The authors present an interesting normative model in discrete and continuous time that extends previous models of evidence accumulation to situations in which switches in the presented stimulus can be expected. In light of this model, a leak in evidence accumulation reflects a tendency to increase uncertainty about the stimulus due to a potentially upcoming switch in the stimulus. The model provides a mathematical relation between the precise type of leak and the expected switch (hazard) rate of the stimulus. In particular, and in contrast to previous models, the leak in the present model depends nonlinearly on the accumulated evidence. As the authors discuss, the presented normative model potentially unifies decision making processes observed in different situations characterised by different stabilities of the underlying stimuli. I had the impression that the authors were very thorough in their analysis. However, some deviations of model and data apparent in Fig. 8 suggest that either the model itself, or the fitting procedure may be improved such that the model better fits people’s behaviour in the dots-reversal task. It was anyway surprising to me that subjects only had to make a single response per trial in that task. This feels like a big waste of potential choice data when I consider that each trial was 5-10s long and contained several stimulus switches (reversals).

## The Influence of Spatiotemporal Structure of Noisy Stimuli in Decision Making.

Insabato, A., Dempere-Marco, L., Pannunzi, M., Deco, G., and Romo, R.
PLoS Comput Biol, 10:e1003492, 2014

### Abstract

Decision making is a process of utmost importance in our daily lives, the study of which has been receiving notable attention for decades. Nevertheless, the neural mechanisms underlying decision making are still not fully understood. Computational modeling has revealed itself as a valuable asset to address some of the fundamental questions. Biophysically plausible models, in particular, are useful in bridging the different levels of description that experimental studies provide, from the neural spiking activity recorded at the cellular level to the performance reported at the behavioral level. In this article, we have reviewed some of the recent progress made in the understanding of the neural mechanisms that underlie decision making. We have performed a critical evaluation of the available results and address, from a computational perspective, aspects of both experimentation and modeling that so far have eluded comprehension. To guide the discussion, we have selected a central theme which revolves around the following question: how does the spatiotemporal structure of sensory stimuli affect the perceptual decision-making process? This question is a timely one as several issues that still remain unresolved stem from this central theme. These include: (i) the role of spatiotemporal input fluctuations in perceptual decision making, (ii) how to extend the current results and models derived from two-alternative choice studies to scenarios with multiple competing evidences, and (iii) to establish whether different types of spatiotemporal input fluctuations affect decision-making outcomes in distinctive ways. And although we have restricted our discussion mostly to visual decisions, our main conclusions are arguably generalizable; hence, their possible extension to other sensory modalities is one of the points in our discussion.

### Review

They review previous findings about perceptual decision making from a computational perspective, mostly related to attractor models of decision making. The focus here, however, is how the noisy stimulus influences the decision. They mostly restrict themselves to experiments with random dot motion, because these provided most relevant results for their discussion which mainly included three points: 1) specifics of decision input in decisions with multiple alternatives, 2) the relation of the activity of sensory neurons to decisions (cf. CP – choice probability) and 3) in what way sensory neurons reflect fluctuations of the particular stimulus. See also first paragraph of Final Remarks for summary, but note that I have made slightly different points. Their 3rd point derives from mine by applying mine to the specifics of the random dot motion stimuli. In particular, they suggest to investigate in how far different definitions of spatial noise in the random dot stimulus affect decisions differently.

With 2) they discuss the interesting finding that already the activity of sensory neurons can, to some extent, predict final decisions even when the evidence in the stimulus does not favour any decision alternative. So where does the variance in sensory neurons come from which eventually leads to a decision? Obviously, it could come from the stimulus itself. It has been found, however, that the ratio of variance to mean activity is the same when computed over trials with different stimuli compared to when computed over trials in which exactly the same stimulus with a particular realisation of noise was repeated. You would like to see a reduction of variance when the same stimulus is repeated, but it’s not there. I’m unsure, though, whether this is the correct interpretation of the variance-mean-ratio. I would have to check the original papers by Britten (Britten, 1993 and Britten, 1996). The seemingly constant variance of sensory neuron activity suggests that the particular noise realisation of a random dot stimulus does not affect decisions. Rather, the intrinsic activity of sensory neurons drives decisions in the case of no clear evidence. The authors argue that this is not a complete description of the situation, because it has also been found that you can see an effect of the particular stimulus on the variance of sensory neuron activity when considering small time windows instead of the whole trial. Unfortunately, the argument is mostly based on results presented in a SfN meeting abstracts in 2012. I wonder why there is no corresponding paper.

## The Cost of Accumulating Evidence in Perceptual Decision Making.

Drugowitsch, J., Moreno-Bote, R., Churchland, A. K., Shadlen, M. N., and Pouget, A.
The Journal of Neuroscience, 32:3612–3628, 2012

### Abstract

Decision making often involves the accumulation of information over time, but acquiring information typically comes at a cost. Little is known about the cost incurred by animals and humans for acquiring additional information from sensory variables due, for instance, to attentional efforts. Through a novel integration of diffusion models and dynamic programming, we were able to estimate the cost of making additional observations per unit of time from two monkeys and six humans in a reaction time (RT) random-dot motion discrimination task. Surprisingly, we find that the cost is neither zero nor constant over time, but for the animals and humans features a brief period in which it is constant but increases thereafter. In addition, we show that our theory accurately matches the observed reaction time distributions for each stimulus condition, the time-dependent choice accuracy both conditional on stimulus strength and independent of it, and choice accuracy and mean reaction times as a function of stimulus strength. The theory also correctly predicts that urgency signals in the brain should be independent of the difficulty, or stimulus strength, at each trial.

### Review

The authors show equivalence between a probabilistic and a diffusion model of perceptual decision making and consequently explain experimentally observed behaviour in the random dot motion task in terms of varying bounds in the diffusion model which correspond to varying costs in the probabilistic model. Here, I discuss their model in detail and outline its limits. My main worry with the presented model is that it may be too powerful to have real explanatory power. Impatient readers may want to skip to the conclusion below.

Perceptual model

The presented model is tailored to the two-alternative, forced choice random dot motion task. The fundamental assumption for the model is that at each point in discrete time, or equivalently, for each successive time period in continuous time the perceptual process of the decision maker produces an independent sample of evidence whose mean, mu*dt, reflects the strength (coherence) and direction (only through sign of evidence) of random dot motion while its variance, sigma2, reflects the passage of time (sigma2 = dt, the time period between observations). This definition of input to the decision model as independent samples of motion strength in either one of two (unspecified) directions restricts the model to two decision alternatives. Consequently, the presented model does not apply to more alternatives, or dependent samples.

The model of noisy, momentary evidence corresponds to a Wiener process with drift which is exactly what standard (drift) diffusion models of perceptual decision making are where drift is equal to mu and diffusion is equal to sigma2. You could wonder why sigma2 is exactly equal to dt and not larger, or smaller, but this is controlled by setting the mean evidence mu to an appropriate level by allowing it to scale: mu = k*c, where k is an arbitrary scaling constant which is fit to data and c is the random dot coherence in the current trial. Therefore, by controlling k you essentially control the signal to noise ratio in the model of the experiment and you would get equivalent results, if you changed sigma2 while fixing mu = c. The difference between the two cases is purely conceptual: In the former case you assume that the neuronal population in MT signals, on average, a scaled motion strength where the scaling may be different for different subjects, but signal variance is the same over subjects while in the latter case you assume that the MT signal, on average, corresponds to motion strength directly, but MT signal variance varies across subjects. Personally, I prefer the latter.

The decision circuit in the author’s model takes the samples of momentary evidence as described above and computes a posterior belief over the two considered alternatives (motion directions). This posterior belief depends on the posterior probability distribution over mean motion strengths mu which is computed from the samples of momentary evidence taking a prior distribution over motion strengths into account. An important assumption in the computation of the posterior is that the decision maker (or decision circuit) has a perfect model of how the samples of momentary evidence are generated (a Gaussian with mean mu*dt and variance dt). If, for example, the decision maker would assume a slightly different variance, that would also explain differences in mean accuracy and decision times. The assumption of the perfect model, however, allows the authors to assert that the experimentally observed fraction of correct choices at a time t is equal to the internal belief of the decision maker (subject) that the chosen alternative is the correct one. This is important, because only with an estimate of this internal belief the authors can later infer the time-varying waiting costs for the subject (see below).

Anyway, under the given model the authors show that for a Gaussian prior you obtain a Gaussian posterior over motion strength mu (Eq. 4) and for a discrete prior you obtain a corresponding discrete posterior (Eq. 7). Importantly, the parameters of the posteriors can be formulated as functions of the current state x(t) of the sample-generating diffusion process and elapsed time t. Consequently, also the posterior belief over decision alternatives can be formulated as a one-to-one, i.e., invertible function of the diffusion state (and time t). By this connection, the authors have shown that, under an appropriate transformation, decisions based on the posterior belief are equivalent to decisions based on the (accumulated) diffusion state x(t) set in relation to elapsed time t.

In summary, the probabilistic perceptual decision model of the authors simply estimates the motion strength from the samples and then decides whether the estimate is positive or negative. Furthermore, this procedure is equivalent to accumulating the samples and deciding whether the accumulated state is very positive or very negative (as determined by hitting a bound). The described diffusion model has been used before to fit accuracies and mean reaction times of subjects, but apparently it was never quite good in fitting the full reaction time distribution (note that it lacks the extensions of the drift diffusion models suggested by Ratcliff, see, e.g., [1]). So here the authors extend the diffusion model by adding time-varying bounds which can be interpreted in the probabilistic model as a time-varying cost of waiting for more samples.

Time-varying bounds and costs

Intuitively, introducing a time-varying bound in a diffusion model introduces great flexibility in shaping the response accuracy and timing at any given time point. However, I currently do not have a good idea of just how flexible the model becomes. For example, if in discrete time changing the bound at each time step could independently modify the accuracy and reaction time distribution at this time step, the bound alone could explain the data. I don’t believe that this extreme case is true, but I would like to know how close you would come. In any case, it appears to be sensible to restrict how much the bound can vary to prevent overfitting of the data, or indeed to prevent making the other model parameters obsolete. In the present paper, the authors control the shape of the bound by using a function made of cosine basis functions. Although this restricts the bound to be a smooth function of time, it still allows considerable flexibility. The authors use two more approaches to control the flexibility of the bound. One is to constrain the bound to be the same for all coherences, meaning that it cannot be used to explain differences between coherences (experimental conditions). The other is to use Bayesian methods for fitting the data. On the one hand, this controls the bound by choosing particular priors. They do this by only considering parameter values in a restricted range, but I do not know how wide or narrow this range is in practice. On the other hand, the Bayesian approach leads to posterior distributions over parameters which means that subsequent analyses can take the uncertainty over parameters into account (see, e.g., the indicated uncertainty over the inferred bound in Fig. 5A). Although I remain with some last doubts about whether the bound was too flexible, I believe that this is not a big issue here.

It is, however, a different question whether the time-varying bound is a good explanation for the observed behaviour in contrast, e.g., to the extensions of the diffusion model introduced by Ratcliff (mostly trial-by-trial parameter variability). There, one might refer to the second, decision-related part of the presented model which considers the rewards and costs associated with decisions. In the Bayesian decision model presented in the paper the subject decides at each time step whether to select alternative 1, or alternative 2, or wait for more evidence in the next time step. This mechanism was already mentioned in [2]. Choosing an alternative will either lead to a reward (correct answer) or punishment (error), but waiting is also associated with a cost which may change throughout the trial. Deciding for the optimal course of action which maximises reward per unit time then is an average-reward reinforcement learning problem which the authors solve using dynamic programming. For a particular setting of reward, punishment and waiting costs this can be translated into an equivalent time-varying bound. More importantly, the procedure can be reversed such that the time-varying cost can be inferred from a bound that had been fitted to data. Apart from the bound, however, the estimate of the cost also depends on the reward/punishment setting and on an estimate of choice accuracy at each time step. Note that the latter differs considerably from the overall accuracy which is usually used to fit diffusion models and requires more data, especially when the error rate is low.

The Bayesian decision model, therefore, allows to translate the time-varying bound to a time-varying cost which then provides an explanation of the particular shape of the reaction time distribution (and accuracy) in terms of the intrinsic motivation (negative cost) of the subject to wait for more evidence. Notice that this intrinsic motivation is really just a value describing how much somebody (dis-)likes to wait and it cannot be interpreted in terms of trying to be better in the task anymore, because all these components have been taken care of by other parts of the decision model. So what does it mean when a subject likes to wait for new evidence just for the sake of it (cf. dip in cost at beginning of trial in human data in Fig. 8)? I don’t know.

Collapsing bounds as found from behavioural data in this paper have been associated with an urgency signal in neural data which drives firing rates of all decision neurons towards a bound at the end of a trial irrespective of the input / evidence. This has been interpreted as a response of the subjects to the approaching deadline (end of trial) that they do not want to miss. The explanation in terms of a waiting cost which rises towards the end of a trial suggests that subjects just have a built-in desire to make (potentially arbitrary) choices before a deadline. To me, this is rather unintuitive. If you’re not punished for making a wrong choice (blue lines in Figs. 7 and 8, but note that there was a small time-punishment in the human experiment) shouldn’t it be always beneficial to make a choice before the deadline, because you trade uncertain reward against certain no reward? This would already be able to explain the urgency signal without consideration of a waiting cost. So why do we see one anyway? It may just all depend on the particular setting of reward and punishment for correct choices and errors, respectively. The authors present different inferred waiting costs with varying amounts of punishment and argue that the results are qualitatively equal, but the three different values of punishment they present hardly exhaust the range of values that could be assumed. Also, they did not vary the amount of reward given for correct choices, but it is likely that only the difference between reward and punishment determines the behaviour of the model such that it doesn’t matter whether you change reward or punishment to explore model predictions.

Conclusion

The main contribution of the paper is to show that accuracy and reaction time distribution can be explained by a time-varying bound in a simple diffusion model in which the drift scales linearly with stimulus intensity (coherence in random dot motion). I tried to point out that this result may not be surprising depending on how much flexibility a time-varying bound adds to the model. Additionally, the authors present a connection between diffusion and Bayesian models of perceptual decision making which allows them to reinterpret the time-varying bounds in terms of the subjective cost of waiting for more evidence to arrive. The authors argue that this cost increases towards the end of a trial, but for two reasons I’m not entirely convinced: 1) Conceptually, it is worth considering the origin of a possible waiting cost. It could correspond to the energetic cost of keeping the inference machinery running and the attention on the task, but there is no reason why this should increase towards a deadline. 2) I’m not convinced by the presented results that the inferred increase of cost towards a deadline is qualitatively independent of the reward/punishment setting. A greater range of punishments should have been tested. Note that you cannot infer the rewards for decisions and the time-varying waiting cost at the same time from the behavioural data. So this issue cannot be settled without some new experiments which measure rewards or costs more directly. Finally, I miss an overview of fitted parameter values in the paper. For example, I would be interested in the inferred lapse trial probabilities p1. The authors go through great lengths to estimate the posterior distributions over diffusion model parameters and I wonder why they don’t share the results with us (at least mean and variance for a start).

In conclusion, the authors follow a trend to explain behaviour in terms of Bayesian ideal observer models extended by flexible cost functions and apply this idea to perceptual decision making via a detour through a diffusion model. Although I appreciate the sound work presented in the paper, I’m worried that the time-varying bound/cost is too flexible and acts as a kind of ‘get out of jail free’ card which blocks the view to other, potentially additional mechanisms underlying the observed behaviour.

References

[1] Bogacz, R.; Brown, E.; Moehlis, J.; Holmes, P. & Cohen, J. D. The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. Psychol Rev, 2006, 113, 700-765

[2] Dayan, P. & Daw, N. D. Decision theory, reinforcement learning, and the brain. Cogn Affect Behav Neurosci, 2008, 8, 429-453

## Representation of confidence associated with a decision by neurons in the parietal cortex.

Kiani, R. and Shadlen, M. N.
Science, 324:759–764, 2009

### Abstract

The degree of confidence in a decision provides a graded and probabilistic assessment of expected outcome. Although neural mechanisms of perceptual decisions have been studied extensively in primates, little is known about the mechanisms underlying choice certainty. We have shown that the same neurons that represent formation of a decision encode certainty about the decision. Rhesus monkeys made decisions about the direction of moving random dots, spanning a range of difficulties. They were rewarded for correct decisions. On some trials, after viewing the stimulus, the monkeys could opt out of the direction decision for a small but certain reward. Monkeys exercised this option in a manner that revealed their degree of certainty. Neurons in parietal cortex represented formation of the direction decision and the degree of certainty underlying the decision to opt out.

### Review

The authors used a 2AFC-task with an option to waive the decision in favour of a choice which provides low, but certain reward (the sure option) to investigate the representation of confidence in LIP neurons. Behaviourally the sure option had the expected effect: it was increasingly chosen the harder the decisions were, i.e., the more likely a false response was. Trials in which the sure option was chosen, thus, may be interpreted as those in which the subject was little confident in the upcoming decision. It is important to note that task difficulty here was manipulated by providing limited amounts of information for a limited amount of time, i.e., this was not a reaction time task.

The firing rates of the recorded LIP neurons indicate that selection of the sure option is associated with an intermediate level of activity compared to that of subsequent choices of the actual decision options. For individual trials the authors found that firing rates closer to the mean firing rate (in a short time period before the sure option became available) more frequently lead to selection of the sure option than firing rates further away from the mean, but in absolute terms the activity in this time window could predict choice of the sure option only weakly (probability of 0.4). From these results the authors conclude that the LIP neurons which have previously been found to represent evidence accumulation also encode confidence in a decision. They suggest a simple drift-diffusion model with fixed diffusion parameter to explain the results. Additional to standard diffusion models they define confidence in terms of the log-posterior odds which they compute from the state of the drift-diffusion model. They define posterior as p(S_i|v), the probability that decision option i is correct given that the drift-diffusion state (the decision variable) is v. They compute it from the corresponding likelihood p(v|S_i), but don’t state how they obtained that likelihood. Anyway, the sure option is chosen in the model, when the log-posterior odds is below a certain level. I don’t see why the detour via the log-posterior odds is necessary. You could directly define v as the posterior for decision option i and still be consistent with all the findings in the paper. Of course, then v could not be governed by a linear drift anymore, but why should it in the first place? The authors keenly promote the Bayesian brain, but stop just before the finishing line. Why?

## Perceptions as hypotheses: saccades as experiments.

Friston, K., Adams, R. A., Perrinet, L., and Breakspear, M.
Front Psychol, 3:151, 2012

### Abstract

If perception corresponds to hypothesis testing (Gregory, 1980); then visual searches might be construed as experiments that generate sensory data. In this work, we explore the idea that saccadic eye movements are optimal experiments, in which data are gathered to test hypotheses or beliefs about how those data are caused. This provides a plausible model of visual search that can be motivated from the basic principles of self-organized behavior: namely, the imperative to minimize the entropy of hidden states of the world and their sensory consequences. This imperative is met if agents sample hidden states of the world efficiently. This efficient sampling of salient information can be derived in a fairly straightforward way, using approximate Bayesian inference and variational free-energy minimization. Simulations of the resulting active inference scheme reproduce sequential eye movements that are reminiscent of empirically observed saccades and provide some counterintuitive insights into the way that sensory evidence is accumulated or assimilated into beliefs about the world.

### Review

In this paper Friston et al. introduce the notion that an agent (such as the brain) minimizes uncertainty about its state in the world by actively sampling those states which minimise the uncertainty of the agent’s posterior beliefs, when visited some time in the future. The presented ideas can also be seen as reply to the commonly formulated dark-room-critique of Friston’s free energy principle which states that under the free energy principle an agent would try to find a dark, stimulus-free room in which sensory input can be perfectly predicted. Here, I review these ideas together with the technical background (see also a related post about Friston et al., 2011). Although I find the presented theoretical argument very interesting and sound (and compatible with other proposals for the origin of autonomous behaviour), I do not think that the presented simulations conclusively show that the extended free energy principle as instantiated by the particular model chosen in the paper leads to the desired exploratory behaviour.

Introduction: free energy principle and the dark room

Friston’s free energy principle has gained considerable momentum in the field of cognitive neuroscience as a unifying framework under which many cognitive phenomena may be understood. Its main axiom is that an agent tries to minimise the long-term uncertainty about its state in the world by executing actions which make prediction of changes in the agent’s world more precise, i.e., which minimise surprises. In other words, the agent tries to maintain a sort of homeostasis with its environment.

While homeostasis is a concept which most people happily associate with bodily functions, it is harder to reconcile with cognitive functions which produce behaviour. Typically, the counter-argument for the free energy principle is the dark-room-problem: changes in a dark room can be perfectly predicted (= no changes), so shouldn’t we all just try to lock ourselves into dark rooms instead of frequently exploring our environment for new things?

The shortcoming of the dark-room-problem is that an agent cannot maintain homeostasis in a dark room, because, for example, its bodily functions will stop working properly after some time without water. There may be many more environmental factors which may disturb the agent’s dark room pleasure. An experienced agent knows this and has developed a corresponding model about its world which tells it that the state of its world becomes increasingly uncertain as long as the agent only samples a small fraction of the state space of the world, as it is the case when you are in a dark room and don’t notice what happens outside of the room.

The present paper formalises this idea. It assumes that an agent only observes a small part of the world in its local surroundings, but also maintains a more comprehensive model of its world. To decrease uncertainty about the global state of the world, the agent then explores other parts of the state space which it beliefs to be informative according to its current estimate of the global world state. In the remainder I will present the technical argument in more detail, discuss the supporting experiments and conclude with my opinion about the presented approach.

Review of theoretical argument

In previous publications Friston postulated that agents try to minimise the entropy of the world states which they encounter in their life and that this minimisation is equivalent to minimising the entropy of their sensory observations (by essentially assuming that the state-observation mapping is linear). The sensory entropy can be estimated by the average of sensory surprise (negative model evidence) across (a very long) time. So the argument goes that an agent should minimise sensory surprise at all times. Because sensory surprise cannot usually be computed directly, Friston suggests a variational approximation in which the posterior distribution over world states (posterior beliefs) and model parameters is separated. Further, the posterior distributions are approximated with Gaussian distributions (Laplace approximation). Then, minimisation of surprise is approximated by minimisation of Friston’s free energy. This minimisation is done with respect to the posterior over world states and with respect to action. The former corresponds to perception and ensures that the agent maintains a good estimate of the state of the world and the latter implements how the agent manipulates its environment, i.e., produces behaviour. While the former is a particular instantiation of the Bayesian brain hypothesis, and hence not necessarily a new idea, the latter had not previously been proposed and subsequently spurred some controversy (cf. above).

At this point it is important to note that the action variables are defined on the level of primitive reflex arcs, i.e., they directly control muscles in response to unexpected basic sensations. Yet, the agent can produce arbitrary complex actions by suitably setting sensory expectations which can be done via priors in the model of the agent. In comparison with reinforcement learning, the priors of the agent about states of the world (the probability mass attributed by the prior to the states), therefore, replace values or costs. But how does the agent choose its priors? This is the main question addressed by the present paper, however, only in the context of a freely exploring (i.e., task-free) agent.

In this paper, Friston et al. postulate that an agent minimises the joint entropy of world states and sensory observations instead of only the entropy of world states. Because the joint entropy is the sum of sensory entropy and conditional entropy (world states conditioned on sensory observations), the agent needs to implement two minimisations. The minimisation of sensory entropy is exactly the same as before implementing perception and action. However, conditional entropy is minimised with respect to the priors of the agent’s model, implementing higher-level action selection.

In Friston’s dynamic free energy framework (and other filters) priors correspond to predictive distributions, i.e., distributions over the world states some time in the future given their current estimate. Friston also assumes that the prior densities are Gaussian. Hence, priors are parameterised by their mean and covariance. To manipulate the probability mass attributed by the prior to the states he, thus, has to change prior mean or covariance of the world states. In the present paper the authors use a fixed covariance (as far as I can tell) and implement changes in the prior by manipulating its mean. They do this indicrectly by introducing new, independent control variables (“controls” from here on) which parameterise the dynamics of the world states without having a dynamics associated with themselves. The controls are treated like the other hidden variables in the agent model and their values are inferred from the sensory observations via free energy minimisation. However, I guess, that the idea is to more or less fix the controls to their prior means, because the second entropy minimisation, i.e., minimisation of the conditional entropy, is with respect to these prior means. Note that the controls are pretty arbitrary and can only be interpreted once a particular model is considered (as is the case for the remaining variables mentioned so far).

As with the sensory entropy, the agent has no direct access to the conditional entropy. However, it can use the posterior over world states given by the variational approximation to approximate the conditional entropy, too. In particular, Friston et al. suggest to approximate the conditional entropy using a predictive density which looks ahead in time from the current posterior and which they call counterfactual density. The entropy of this counterfactual density tells the agent how much uncertainty about the global state of the world it can expect in the future based on its current estimate of the world state. The authors do not specify how far in the future the counterfactual density looks. They here use the denotational trick to call negative conditional entropy ‘saliency’ to make the correspondence between the suggested framework and experimental variables in their example more intuitive, i.e., minimisation of conditional entropy becomes maximisation of saliency. The actual implementation of this nonlinear optimisation is computationally demanding. In particular, it will be very hard to find global optima using gradient-based approaches. In this paper Friston et al. bypass this problem by discretising the space spanned by the controls (which are the variables with respect to which they optimise), computing conditional entropy at each discrete location and simply selecting the location with minimal entropy, i.e., they do grid search.

Review of presented experiments (saccade model)

To illustrate the theoretical points, Friston et al. present a model for saccadic eye movements. This model is very basic and is only supposed to show in principle that the new minimisation of conditional entropy can provide sensible high-level action. The model consists of two main parts: 1) the world, which defines how sensory input changes based on the true underlying state of the world and 2) the agent, which defines how the agent believes the world behaves. In this case, the state of the world is the position in a viewed image which is currently fixated by the eye of the agent. This position, hence, determines what input the visual sensors of the agent currently get (the field of view around the fixation position is restricted), but additionally there are proprioceptive sensors which give direct feedback about the position. Action changes the fixation position. The agent has a similar, but extended model of the world. In it, the fixation position depends on the hidden controls. Additionally, the model of the agent contains several images such that the agent has to infer what image it sees based on its sensory input.

In Friston’s framework, inference results heavily depend on the setting of prior uncertainties of the agent. Here, the agent is assumed to have certain proprioception, but uncertain vision such that it tends to update its beliefs of what it sees (which image) rather than trying to update its beliefs of where it looks. [I guess, this refers to the uncertainties of the hidden states and not the uncertainties of the actual sensory input which was probably chosen to be quite certain. The text does not differentiate between these and, unfortunately, the code was not yet available within the SPM toolbox at the time of writing (08.09.2012).]

For the presented experiments the authors use an agent model which contains three images as hypotheses of what the agent observes: a face and its 90° and 180° rotated versions. The first experiment is supposed to show that the agent can correctly infer which image it observes by making saccades to low conditional entropy (‘salient’) positions. The second experiment is supposed to show that, when an image is observed which is unknown to the agent, the agent cannot be certain of which of the three images it observes. The third experiment is supposed to show that the uncertainty of the agent increases when high entropy high-level actions are chosen instead of low entropy ones (when the agent chooses positions which contain very little information). I’ll discuss them in turn.

The second experiment acts like a sanity check: the agent shouldn’t be able to identify one of its three images, when it observes a fourth one. Whether the experiment shows that, depends on the interpretation of the inferred hidden states. The way these states were defined their values can be directly interpreted as the probability of observing one of the three images. If only these are considered the agent appears to be very certain at times (it doesn’t help that the scale of the posterior belief plot in Figure 6 is 4 times larger than that of the same plot in Figure 5). However, the posterior uncertainty directly associated with the hidden states appears to be indeed considerably larger than in experiment 1, but, again, this is only a single example. Something that is rather strange: the sequence of fixation positions is almost exactly the same as in experiment 1 even though the observed image and the resulting posterior beliefs were completely different. Why?

Finally, experiment three is more like a thought experiment: what would happen, if an agent chooses high-level actions which maximise future uncertainty instead of minimising it. Well, the uncertainty of the agent’s posterior beliefs increases as shown in Figure 7, which is the expected behaviour. One thing that I wonder, though, and it applies to the presented results of all experiments: In Friston’s Bayesian filtering framework the uncertainty of the posterior hidden states is a direct function of their mean values. Hence, as long as the mean values do not change, the posterior uncertainty should stay constant, too. However, we see in Figure 7 that the posterior uncertainty increases even though the posterior means stay more or less constant. So there must be an additional (unexplained) mechanism at work, or we are not shown the distribution of posterior hidden states, but something slightly different. In both cases, it would be important to know what exactly resulted in the presented plots to be able to interpret the experiments in the correct way.

Conclusion

The paper presents an important theoretical extension to Friston’s free energy framework. This extension consists of adding a new layer of computations which can be interpreted as a mechanism for how an agent (autonomously) chooses its high-level actions. These high-level actions are defined in terms of desired future states encoded by the probability mass which is assigned to these states by the prior state distribution. Conceptually, these ideas translate into choosing maximally informative actions given the agent’s model of the world and its current state estimate. As discussed by Friston et al. such approaches to action selection are not new (see also Tishby and Polani, 2011). So, the author’s contribution is to show that these ideas are compatible with Friston’s free energy framework. Hence, on the abstract, theoretical level this paper makes sense. It also provides a sound theoretical argument for why an agent would not seek sensory deprivation in a dark room, as feared by critics of the free energy principle. However, the presented framework heavily relies on the agent’s model of the world and it leaves open how the agent has attained this model. Although the free energy principle also provides a way for the agent to learn parameters of its model, I still, for example, haven’t seen a convincing application in which the agent actually learnt the dynamics of an unknown process in the world. Probably Friston would here also refer to evolution as providing a good initialisation for process dynamics, but I find that a too cheap way out.

From a technical point of view the paper leaves a few questions open, for example: How far does the counterfactual distribution look into the future? What does it mean for high-level actions to change how far the agent looks into his subjective future? How well does the presented approach scale? Is it important to choose the global minimum of the conditional entropy (this would be bad, as it’s probably extremely hard to find in a general setting)? When, or how often, does the agent minimise conditional entropy to set high-level actions? What happens with more than one control variables (several possible high-level actions)? How can you model discrete high-level actions in Friston’s continuous Gaussian framework? How do results depend on the setting of prior covariances / uncertainties. And many more.

Finally, I have to say that I find the presented experiments quite poor. Although providing the agent with a limited field of view such that it has to explore different regions of a presented image is a suitable setting to test the proposed ideas, the trivial example and introduction of ad-hoc inhibition of return make it impossible to judge whether the underlying principle is successfully at work, or the simulations have been engineered to work in this particular case.

## Inhibitory plasticity balances excitation and inhibition in sensory pathways and memory networks.

Vogels, T. P., Sprekeler, H., Zenke, F., Clopath, C., and Gerstner, W.
Science, 334:1569–1573, 2011

### Abstract

Cortical neurons receive balanced excitatory and inhibitory synaptic currents. Such a balance could be established and maintained in an experience-dependent manner by synaptic plasticity at inhibitory synapses. We show that this mechanism provides an explanation for the sparse firing patterns observed in response to natural stimuli and fits well with a recently observed interaction of excitatory and inhibitory receptive field plasticity. The introduction of inhibitory plasticity in suitable recurrent networks provides a homeostatic mechanism that leads to asynchronous irregular network states. Further, it can accommodate synaptic memories with activity patterns that become indiscernible from the background state but can be reactivated by external stimuli. Our results suggest an essential role of inhibitory plasticity in the formation and maintenance of functional cortical circuitry.

### Review

The authors show that, if the same input to an output neuron arrives through an excitatory and a delayed inhibitory channel, synaptic plasticity (a symmetric STDP rule) at the inhibitory synapses leads to “detailed balance”, i.e., to cancellation of excitatory and inhibitory input currents. Then, the output neuron fires sparsely and irregularly (as observed for real neurons) only when an excitatory input was not predicted by the implicit model encoded by the synaptic weights of the inhibitory inputs. The adaptation of the inhibitory synapses also matches potential changes in the excitatory synapses, although here they only present simulations in which excitatory synapses changed abruptly and stayed constant afterwards. (What happens when excitatory and inhibitory synapses change concurrently?) Finally, the authors show that similar results apply to recurrently connected networks of neurons with dedicated inhibitory neurons (balanced networks). Arbitrary activity patterns can be encoded by the excitatory connections, activity in these patterns is then suppressed by the inhibitory neurons, while partial activation of the patterns through external input reactivates the whole patterns (cf. recall of memory) without suppressing potential reactivation of other patterns in the network.

These are interesting ideas, clearly presented and with very detailed supplementary information. The large number of inhibitory neurons in cortex makes the assumed pairing of excitatory and inhibitory input at least possible, but I don’t know how prevalent this really is. Another important assumption here is that the inhibitory input is a bit slower than the excitatory input. This makes intuitive sense, if you assume that the inhibitory input needs to be relayed through an additional inhibitory neuron, but I’ve seen the opposite assumption before, too.

## Representational switching by dynamical reorganization of attractor structure in a network model of the prefrontal cortex.

Katori, Y., Sakamoto, K., Saito, N., Tanji, J., Mushiake, H., and Aihara, K.
PLoS Comput Biol, 7:e1002266, 2011

### Abstract

The prefrontal cortex (PFC) plays a crucial role in flexible cognitive behavior by representing task relevant information with its working memory. The working memory with sustained neural activity is described as a neural dynamical system composed of multiple attractors, each attractor of which corresponds to an active state of a cell assembly, representing a fragment of information. Recent studies have revealed that the PFC not only represents multiple sets of information but also switches multiple representations and transforms a set of information to another set depending on a given task context. This representational switching between different sets of information is possibly generated endogenously by flexible network dynamics but details of underlying mechanisms are unclear. Here we propose a dynamically reorganizable attractor network model based on certain internal changes in synaptic connectivity, or short-term plasticity. We construct a network model based on a spiking neuron model with dynamical synapses, which can qualitatively reproduce experimentally demonstrated representational switching in the PFC when a monkey was performing a goal-oriented action-planning task. The model holds multiple sets of information that are required for action planning before and after representational switching by reconfiguration of functional cell assemblies. Furthermore, we analyzed population dynamics of this model with a mean field model and show that the changes in cell assemblies’ configuration correspond to those in attractor structure that can be viewed as a bifurcation process of the dynamical system. This dynamical reorganization of a neural network could be a key to uncovering the mechanism of flexible information processing in the PFC.

### Review

Based on firing properties of certain prefrontal cortex neurons the authors suggest a network model in which short-term plasticity implements switches of what the neurons in the network represent. In particular, neurons in prefrontal cortex have been found which switch from representing goals to representing actions (first, their firing varies depending on which goal is shown, then it varies depending on which action is executed afterwards while firing equally for all goals). The authors call this representational switches and they assume that these are implemented via changes in the connection strengths of neurons in a recurrently connected neural network. The network is setup such that network activity always converges to one of several fixed point attractors. A suitable change in connection strengths then leads to a change in the attractor landscape which may be interpreted as a change in what the network represents. The main contribution of the authors is to suggest a particular pattern of short-term plasticity at synapses in the network such that the network exhibits the desired representational switching. Another important aspect of this model is its structure: the network consists of separate cell assemblies, different subsets of which are assumed to be active when either goals or actions are represented and the goal and action subsets are partially overlapping. For example, in their model they have four cell assemblies (A,B,C,D) and the subsets (A,B) and (C,D) are associated with goals while subsets (A,D) and (B,C) are associated with actions. Initially the network is assumed to be in the goal state in which the connection strenghts A-B and C-D are large. The presentation of one of two goals then makes the network activity converge to strong activation of (A,B) or (C,D). Synaptic depression of connections A-B (assuming that this is the active subset) with simultaneous facilitation of connections A-D and B-C then leads to the desired change of connection strengths which implements the representational switch and then makes either subset (A-D), or subset (B-C) the active subset. It is not entirely clear to me why only one action subset becomes active. Maybe this is what the inhibitory units in the model are for (their function is not explained by the authors). In further analysis and experiments the authors confirm the attractor landscape of the model (and how it changes), show that the timing of the representational switch can be influenced by input to the network and show that the probability of changing from a particular goal to a particular action can be manipulated by changing the number of prior connections between the corresponding cell assemblies.

The authors show a nice qualitative correspondence between experimental findings and simulated network behaviour (although some qualitative differences are left, too, e.g., a general increase of firing also for the non-preferred goal and action in the experimental findings). In essence, the authors present a mechanism which could implement the (seemingly) autonomous switching of representations in prefrontal cortex neurons. Whether this mechanism is used by the brain is an entirely different question. I don’t know of evidence backing the chosen special wiring of neurons and distribution of short-term placticity, but this might just reflect my lack of knowledge of the field. Additionally, I wouldn’t exclude the possibility of a hierarchical model. The authors argue against this by presuming that prefrontal cortex already should be the top of the hierarchy, but nothing prevents us to make hierarchical models of prefrontal cortex itself. This points to the mixing of levels of description in the paper: On the one hand, the main contributions of the paper are on the algorithmic level describing the necessary wiring in a network of a few units and how it needs to change to reproduce the behaviour observed in experiments. On the other hand, the main model is on an implementational level showing how these ideas could be implemented in a network of leaky integrate and fire (LIF) neurons. In my opinion, the LIF neuron network doesn’t add anything interesting to the paper apart from the proof that the algorithmic ideas can be implemented by such a network. On the contrary, it masks a bit the main points of the paper by introducing an abundance of additional parameters which needed to be chosen by the authors, but for which we don’t know which of these settings are important. Finally, I wonder how the described network is reset in order to be ready for the next trial. The problem is the following: the authors initialise the network such that the goal subsets have a high synaptic efficacy at the start of the trial. The short-term plasticity then reduces these synaptic efficacies while simultaneously increasing those of the action subsets. At the end of a trial they all end up in a similar range (see Fig. 3A bottom). In order for the network to work as expected in the next trial, it somehow needs to reset to the initial synaptic efficacies.

## Action understanding and active inference.

Friston, K., Mattout, J., and Kilner, J.
Biol Cybern, 104:137–160, 2011

### Abstract

We have suggested that the mirror-neuron system might be usefully understood as implementing Bayes-optimal perception of actions emitted by oneself or others. To substantiate this claim, we present neuronal simulations that show the same representations can prescribe motor behavior and encode motor intentions during action-observation. These simulations are based on the free-energy formulation of active inference, which is formally related to predictive coding. In this scheme, (generalised) states of the world are represented as trajectories. When these states include motor trajectories they implicitly entail intentions (future motor states). Optimizing the representation of these intentions enables predictive coding in a prospective sense. Crucially, the same generative models used to make predictions can be deployed to predict the actions of self or others by simply changing the bias or precision (i.e. attention) afforded to proprioceptive signals. We illustrate these points using simulations of handwriting to illustrate neuronally plausible generation and recognition of itinerant (wandering) motor trajectories. We then use the same simulations to produce synthetic electrophysiological responses to violations of intentional expectations. Our results affirm that a Bayes-optimal approach provides a principled framework, which accommodates current thinking about the mirror-neuron system. Furthermore, it endorses the general formulation of action as active inference.

### Review

In this paper the authors try to convince the reader that the function of the mirror neuron system may be to provide amodal expectations for how an agent’s body will change, or interact with the world. In other words, they propose that the mirror neuron system represents, more or less abstract, intentions of an agent. This interpretation results from identifying the mirror neuron system with hidden states in a dynamic model within Friston’s active inference framework. I will first comment on the active inference framework and the particular model used and will then discuss the biological interpretation.

Active inference framework:

Active inference has been described by Friston elsewhere (Friston et al. PLoS One, 2009; Friston et al. Biol Cyb, 2010). Note that all variables are continuous. The main idea is that an agent maximises the likelihood of its internal model of the world as experienced by its sensors by (1) updating the hidden states of this model and (2) producing actions on the world. Under the Gaussian assumptions made by Friston both ways to maximise the likelihood of the model are equivalent to minimising the precision-weighted prediction errors defined in the model. Potentially the models are hierarchical, but here only a single layer is used which consists of sensory states and hidden states. The prediction errors on sensory states are simply defined as the difference between sensory observations and sensory predictions from the model as you would intuitively do. The model also defines prediction errors on hidden states (*). Both types of prediction errors are used to infer hidden states (1) which explain sensory observations, but action is only produced (2) from sensory state prediction errors, because action is not part of the agent’s model and only affects sensory observations produced by the world.

Well, actually the agent needs a whole other model for action which implements the gradient of sensory observations with respect to action, i.e., which tells the agent how sensory observations change when it exerts action. However, Friston restricts sensory obervations in this context to proprioceptive observations, i.e., muscle feedback, and argues that the corresponding gradient may be sufficiently simple to learn and represent so that we don’t have to worry about it (in the simulation he just provides the gradient to the agent). Therefore, action solely tries to implement proprioceptive predictions. On the other hand, proprioceptive predictions may be coupled to predictions in other modalities (e.g. vision) through the agent’s model which allows the agent to execute (seemingly) higher-level actions. For example, if an agent sees its hand move from a cup to a glass on a table in front of it, its generative model must also represent the corresponding proprioceptive signals. If then the agent predicts this movement of its hand in visual space, the generative model must automatically predict the corresponding proprioceptive signals, because they always accompanied the seen movement. Action then minimises the resulting precision-weighted proprioceptive prediction error and so implements the hand movement from cup to glass.

Notice that the agent minimises the *precision-weighted* prediction errors. Precision here means the inverse *prior* covariance, i.e., it is a measure for how certain the agent *expects* to be about its observations. By changing the precisions, qualitatively very different results can be obtained within the active inference framework. Indeed, here they implement the switch from action generation to action observation by heavily reducing the precision of the proprioceptive observations. This makes the agent ignore any proprioceptive prediction errors when both updating hidden states (1) and generating action (2). This leads to an interesting prediction: when you observe an action by somebody else, you shouldn’t notice when the corresponding body part is moved externally, or alternatively, when you observe somebody elses movement, you shouldn’t be able to move the corresponding body part yourself (in a different way than the observed). In this strict formulation this prediction appears to be very unlikely, but formulating it more softly, that you should see interference effects in these situations, you may be able to find evidence for it.

This thought also points to the general problem of finding suitable precisions: How do you strike a balance between action (2) and perception (1)? Because they are both trying to reduce the same prediction errors, the agent has to tradeoff recognising the world as it is (1) and changing it so that it corresponds to his expectations (2). This dichotomy is not easily resolved. When asked about it, Friston usually points to empirical priors, i.e., that the agent has learnt to choose suitable precisions based on his past experience (not very helpful, if you want to know how they are chosen). I guess, it’s really a question about how strongly the agent expects (wants) a certain outcome. A useful practical consideration also is that action is constrained, e.g., an agent can’t move infinitely fast, which means that enough prediction error should be left over for perceiving changes in the world (1), in particular those that are not within reach of the agent’s actions on the expected time scale.

I do not discuss the most common reservation against Friston’s free-energy principle / active inference framework (that people seem to have an intrinsic curiosity towards new things as well), because it has been covered elsewhere (John Langford’s blogNature Neuroscience).

Handwriting model:

In this paper the particular model used is interpreted as a model for handwriting although neither a hand is modeled, nor actual writing. Rather, a two-joint system (arm) is used where the movement of the end-effector position (tip) is designed such that it is qualitatively similar to hand-writing without actually producing common letters. The dynamic model of the agent consists of two parts: (a) a stable heteroclinic channel (SHC) which produces a periodic sequence of 6 continuously changing states and (b) a linear attractor dynamics in joint angle space of the arm which is attracted to a rest position, but modulated by the distance of the tip to a desired point in Cartesian space which is determined by the SHC state. Thus, the agent expects that the tip of its arm moves along a sequence of 6 desired points where the dynamics of the arm movement is determined by the linear attractor. The agent observes the joint angle positions and velocities (proprioceptive) and the Cartesian positions of the elbow joint and tip (visual). The dynamic model of the world (so to say implementing the underlying physics) lacks the SHC dynamics and only defines the linear attractor in joint space which is modulated by action and some (unspecified) external variables which can be used to perturb the system. Interestingly, the arm is stronger attracted to its rest position in the world model than in the agent model. The reason for this is not clear to me, but it might not be important, because action could correct for this.

Biological interpretation:

The system is setup such that the agent model contains additional hidden states compared to the world which may be interpreted as intentions of the agent, because they determine the order of the points that the tip moves to. In simulations the authors show that the described models within the active inference framework indeed lead to actions of the agent which implement a “writing” movement even though the world model did not know anything about “writing” at all. This effect has already been shown in the previously mentioned publications.

Here is new that they show that the same model can be used to observe an action without generating action at the same time. As mentioned before, they simply reduce the precision of the proprioceptive observations to achieve this. They then replay the previously recorded actions of the agent in the world by providing them via the external variables. This produces an equivalent movement of the arm in the world without any action being exerted by the agent. Instead of generating its own movement the agent then has the task to recognise a movement executed by somebody/something else. This works, because the precision of the visual obserations was kept high such that the hidden SHC states can be inferred correctly (1). The authors mention a delay before the SHC states catch up with the equivalent trajectory under action. This should not be over-interpreted, because other than mentioned in the text the initial conditions for the two simulations were not the same (see figures and code). The important argument the authors try to make here is that the same set of variables (SHC states) are equally active during action as well as action observation and, therefore, provide a potential functional explanation for activity in the mirror neuron system.

Furthermore, the authors argue that SHC states represent the intentions of the agent, or, equivalently, the intentions of the agent which is observed, by noting that the desired tip positions as specified by the SHC states are only (approximately) reached at a later point in time in the world. This probably results from the inertia built into the joint angle dynamics. Probably there are dynamic models for which this effect disappears, but it sounds plausible to me to assume that when one dynamic system d1 influences the parameters of another dynamic system d2 (as here), that d2 first needs to catch up with its state to the new parameter setting. So these delays would be expected for most hierarchical dynamic systems.

Another line of argument of the authors is to relate prediction errors in the model with electrophysiological (EEG) findings. This is based on Friston’s previous suggestion that superficial pyramidal cells are likely candidates for implementing prediction error units. At the same time, activity of these cells is thought to dominate EEG signals. I cannot judge the validity of both hypothesis, although the former seems to have less experimental support than the latter. In any case, I find the corresponding arguments in this paper quite weak. The problem is that results from exactly one run with one particular setting of parameters of one particular model is used to make very general statements based on a mere qualitative fit of parts of the data to general experimental findings. In other words, I’m not confident that similar (desired) patterns would be seen in the prediction errors, if other settings of precisions, or parameters of the dynamical systems would be chosen.

Conclusion:

The authors suggest how the mirror neuron system can be understood within Friston’s active inference framework. These conceptual considerations make sense. In general, the active inference framework provides large explanatory power and many phenomena may be understood in its context. However, in my point of view, it is an entirely open question how the functional considerations of the active inference framework may be implemented in neurobiological substrate. The superficial arguments based on prediction errors generated by the model, which are presented in the paper, are not convincing. More evidence needs to be found which robustly links variables in an active inference model with neuroscientific measurements.

But also conceptually it is not clear whether the active inference solution correctly describes the computations of the brain. On the one hand, it potentially explains many important and otherwise disparate phenomena under a common principle (e.g. perception, action, learning, computing with noise, dynamics, internal models, prediction; this paper adds action understanding). On the other hand, we don’t know whether all brain functions actually follow a common principle and whether functionally equivalent solutions for subsets of phenomena may be better descriptions of the underlying computations.

An important issue for future studies which aim to discern these possibilities is that active inference is a general framework which needs to be instantiated with a particular model before its properties can be compared to experimental data. However, little is known about the kind of hierarchical, dynamic, functional models itself, which must serve as generative models for active inference. As in this paper, it then is hard to discern the properties of the chosen model from the properties imposed by the active inference framework. Therefore, great care has to be taken in the interpretation of corresponding results, but it would be exciting to learn about which properties of the active inference framework are crucial in brain function and which would need to be added, adapted, or dropped in a faithful description of (subsets of) brain function.

(*) Hidden state prediction errors result from Friston’s special treatment of dynamical systems by extending states by their temporal derivatives to obtain generalised states which represent a local trajectory of the states through time. The hidden state prediction errors, thus, can be seen, intuitively, as the difference between the velocity of the (previously inferred) hidden states as represented by the trajectory in generalised coordinates and the velocity predicted by the dynamic model.

## Flexible vowel recognition by the generation of dynamic coherence in oscillator neural networks: speaker-independent vowel recognition.

Liu, F., Yamaguchi, Y., and Shimizu, H.
Biol Cybern, 71:105–114, 1994

### Abstract

We propose a new model for speaker-independent vowel recognition which uses the flexibility of the dynamic linking that results from the synchronization of oscillating neural units. The system consists of an input layer and three neural layers, which are referred to as the A-, B- and C-centers. The input signals are a time series of linear prediction (LPC) spectrum envelopes of auditory signals. At each time-window within the series, the A-center receives input signals and extracts local peaks of the spectrum envelope, i.e., formants, and encodes them into local groups of independent oscillations. Speaker-independent vowel characteristics are embedded as a connection matrix in the B-center according to statistical data of Japanese vowels. The associative interaction in the B-center and reciprocal interaction between the A- and B-centers selectively activate a vowel as a global synchronized pattern over two centers. The C-center evaluates the synchronized activities among the three formant regions to give the selective output of the category among the five Japanese vowels. Thus, a flexible ability of dynamical linking among features is achieved over the three centers. The capability in the present system was investigated for speaker-independent recognition of Japanese vowels. The system demonstrated a remarkable ability for the recognition of vowels very similar to that of human listeners, including misleading vowels. In addition, it showed stable recognition for unsteady input signals and robustness against background noise. The optimum condition of the frequency of oscillation is discussed in comparison with stimulus-dependent synchronizations observed in neurophysiological experiments of the cortex.

### Review

The authors present an oscillating recurrent neural network model for the recognition of Japanese vowels. The model consists of 4 layers: 1) an input layer which gives pre-processed frequency information, 2) an oscillatory hidden layer with local inhibition, 3) another oscillatory hidden layer with long-range inhibition and 4) a readout layer implementing the classification of vowels using a winner-takes-all mechanism. Layers 1-3 each contain 32 units where each unit is associated to one input frequency. The output layer contains one unit for each of the 5 vowels and the readout mechanism is based on multiplication of weighted sums of layer 3 activities such that the output is also oscillatory. The oscillatory units in layers 2 and 3 consist of an excitatory element coupled with an inhibitory element which oscillate, or become silent, depending on the input. The long-range connections in layer 3 are determined manually based on known correlations between formants (characteristic frequencies) of the individual vowels.

In experiments the authors show that the classification of their network is robust against different speakers (14 men, 5 women, 5 girls, 5 boys): 6 out of 145 trials were correctly classified. However, they do not report what exactly their criterion for classification performance was (remember that the output was oscillatory, also sometimes alternative vowels show bumps in the time course of a vowel in the shown examples). They also report robustness to imperfect stimuli (formants varying within a vowel) and noise (superposition of 12 different conversations), but only single examples are shown.

Without being able to tell what the state of the art in neural networks in 1994 was, I guess the main contribution of the paper is that it shows that vowel recognition may be robustly implemented using oscillatory networks. At least from today’s perspective the suggested network is a bad solution to the technical problem of vowel recogntion, but even alternative algorithms at the time were probably better in that (there’s a hint in one of the paragraphs in the discussion). The paper is a good example for what was wrong with neural network research at the time: the models give the feeling that they are pretty arbitrary. Are the units in the network only defined and connected like they are, because these were the parameters that worked? Most probably. At least here the connectivity is partly determined through some knowledge of how frequencies produced by vowels relate, but many other parameters appear to be chosen arbitrarily. Respect to the person who made it work. However, the results section is rather weak. They only tested one example of a spoken vowel per person and they don’t define classification performance clearly. I guess, you could argue that it is a proof-of-concept of a possible biological implementation, but then again it is still unclear how this can be properly related to real networks in the brain.

## Bayesian estimation of dynamical systems: an application to fMRI.

Friston, K. J.
Neuroimage, 16:513–530, 2002