Representation of confidence associated with a decision by neurons in the parietal cortex.

Kiani, R. and Shadlen, M. N.
Science, 324:759–764, 2009
DOI, Google Scholar

Abstract

The degree of confidence in a decision provides a graded and probabilistic assessment of expected outcome. Although neural mechanisms of perceptual decisions have been studied extensively in primates, little is known about the mechanisms underlying choice certainty. We have shown that the same neurons that represent formation of a decision encode certainty about the decision. Rhesus monkeys made decisions about the direction of moving random dots, spanning a range of difficulties. They were rewarded for correct decisions. On some trials, after viewing the stimulus, the monkeys could opt out of the direction decision for a small but certain reward. Monkeys exercised this option in a manner that revealed their degree of certainty. Neurons in parietal cortex represented formation of the direction decision and the degree of certainty underlying the decision to opt out.

Review

The authors used a 2AFC-task with an option to waive the decision in favour of a choice which provides low, but certain reward (the sure option) to investigate the representation of confidence in LIP neurons. Behaviourally the sure option had the expected effect: it was increasingly chosen the harder the decisions were, i.e., the more likely a false response was. Trials in which the sure option was chosen, thus, may be interpreted as those in which the subject was little confident in the upcoming decision. It is important to note that task difficulty here was manipulated by providing limited amounts of information for a limited amount of time, i.e., this was not a reaction time task.

The firing rates of the recorded LIP neurons indicate that selection of the sure option is associated with an intermediate level of activity compared to that of subsequent choices of the actual decision options. For individual trials the authors found that firing rates closer to the mean firing rate (in a short time period before the sure option became available) more frequently lead to selection of the sure option than firing rates further away from the mean, but in absolute terms the activity in this time window could predict choice of the sure option only weakly (probability of 0.4). From these results the authors conclude that the LIP neurons which have previously been found to represent evidence accumulation also encode confidence in a decision. They suggest a simple drift-diffusion model with fixed diffusion parameter to explain the results. Additional to standard diffusion models they define confidence in terms of the log-posterior odds which they compute from the state of the drift-diffusion model. They define posterior as p(S_i|v), the probability that decision option i is correct given that the drift-diffusion state (the decision variable) is v. They compute it from the corresponding likelihood p(v|S_i), but don’t state how they obtained that likelihood. Anyway, the sure option is chosen in the model, when the log-posterior odds is below a certain level. I don’t see why the detour via the log-posterior odds is necessary. You could directly define v as the posterior for decision option i and still be consistent with all the findings in the paper. Of course, then v could not be governed by a linear drift anymore, but why should it in the first place? The authors keenly promote the Bayesian brain, but stop just before the finishing line. Why?

A supramodal accumulation-to-bound signal that determines perceptual decisions in humans.

O’Connell, R. G., Dockree, P. M., and Kelly, S. P.
Nat Neurosci, 15:1729–1735, 2012
DOI, Google Scholar

Abstract

In theoretical accounts of perceptual decision-making, a decision variable integrates noisy sensory evidence and determines action through a boundary-crossing criterion. Signals bearing these very properties have been characterized in single neurons in monkeys, but have yet to be directly identified in humans. Using a gradual target detection task, we isolated a freely evolving decision variable signal in human subjects that exhibited every aspect of the dynamics observed in its single-neuron counterparts. This signal could be continuously tracked in parallel with fully dissociable sensory encoding and motor preparation signals, and could be systematically perturbed mid-flight during decision formation. Furthermore, we found that the signal was completely domain general: it exhibited the same decision-predictive dynamics regardless of sensory modality and stimulus features and tracked cumulative evidence even in the absence of overt action. These findings provide a uniquely clear view on the neural determinants of simple perceptual decisions in humans.

Review

The authors report EEG signals which may represent 1) instantaneous evidence and 2) accumulated evidence (decision variable) during perceptual decision making. The result promises a big leap for experiments in perceptual decision making with humans, because it is the first time that we can directly observe the decision process as it accumulates evidence with reasonable temporal resolution without sticking needles in participant’s brains. Furthermore, one of the found signals appears to be sensory and response modality independent, i.e., it appears to reflect the decision process alone – something that has not been clearly found in species other than humans, but let’s discuss the study in more detail.

The current belief about the perceptual decision making process is formalised in accumulation to bound models: When presented with a stimulus, the decision maker determines at each time point of the presentation the current amount of evidence for all possible alternatives. This estimate of “instantaneous evidence” is noisy, because of either the noise within the stimulus itself, or because of internal processing noise. Therefore, the decision maker does not immediately make a decision between alternatives, but accumulates evidence over time until the accumulated evidence for one of the alternatives reaches a threshold which is internally set by the decision maker itself and indicates a certain level of certainty, or response urgency. The alternative, for which the threshold was crossed, is the decision outcome and the time the threshold was crossed is the decision time (potentially including an additional delay). The authors argue that they have found signals in the EEG of humans which can be associated with the instantaneous and accumulated evidence variables of these kinds of models.

The paradigm used in this study was different from the perceptual decision making paradigm popular in monkeys (random dot stimuli). Here the authors used stimuli which did not move, but rather gradually changed their intensity or contrast: In the experiments with visual stimuli, participants were continuously viewing a flickering disk which from time to time gradually changed its contrast with the background (the contrast gradually went back to base level after 1.6s). So the participants had to decide whether they observe a contrast different from baseline at the current time. Note that this setup is slightly different from usual trial-based perceptual decision making experiments where a formally new trial begins after a participant’s response. The disk also had a pattern, but it’s unclear why the pattern was necessary. On the other hand, using the other stimulus properties seems reasonable: The flickering induced something like continuous evoked potentials in the EEG ensuring that something stimulus-related could be measured at all times, but the gradual change of contrast “successfully eliminated sensory-evoked deflections from the ERP trace” such that the more subtle accumulated evidence signals were not masked by large deflections solely due to stimulus onsets. In the experiments with sounds, equivalent stimulus changes were implemented by either gradually changing the volume of a presented, envelope-modulated tone or its frequency.

The authors report 4 EEG signals related to perceptual decision making. They argue that the occipital steady-state visual-evoked potential (SSVEP) indicated the estimated instantaneous evidence when visual stimuli were used, because its trajectories directly reflected the changes in constrast. For auditory stimuli, the authors found a corresponding steady-state auditory-evoked potential (SSAEP) which was located at more central EEG electrodes and at 40Hz instead of 20Hz (SSVEP). Further, the authors argue that a left-hemisphere beta (LHB, 22-30Hz) and a centro-parietal potential (CPP, direct electrode measurements) could be interpreted as evidence accumulation signals, because the time of their peaks tightly predicted reaction times and their time courses were better predicted by the cumulative SSVEP instead of the original SSVEP. LHB and CPP also (roughly) showed the expected dependency on whether the participant correctly identified the target, or missed it (lower signals for misses). Furthermore, they reacted expectedly, when contrast varied in more complex ways than just a linear decrease (decrease followed by short increase followed by decrease). CPP was different from LHB by also showing the expected changes when the task did not require an overt response at target detection time whereas LHB showed no relation to the present evidence in this task indicating that it may have something to do with motor preparation of the response while CPP is a more abstract decision signal. Additionally, the CPP showed the characteristic changes measured with visual stimuli also with auditory stimuli and it depended on attentional focus: In one experimental condition the task of the participants was altered (‘detect a transient size change of a central fixation square’), but the original disk stimulus was still presented including the gradual contrast changes. In this ‘non-attend’ condition the SSVEP decreased with contrast as before, but the CPP showed no response reinforcing the idea that the CPP is an abstract decision signal. On a final note, the authors speculate that the CPP could be equal to the standard P300 signal, when transient stimuli need to be detected instead of gradual stimulus changes. This connection, if true, would be a nice functional explanation of the P300.

Open Questions

Despite the generally intriguing results presented in the paper a few questions remain. These predominantly regard details.

1) omission of data

In Figs. 2 and 3 the SSVEP is not shown anymore, presumably because of space restrictions. Similarly, the LHB is not presented in Fig. 4. I can believe that the SSVEP behaved expectedly in the different conditions of Figs. 2 and 3 such that not much information would have been added by providing the plots, but it would at least be interesting to know whether the accumulated SSVEP still predicted the LHB and CCP better than the original SSVEP in these conditions. Likewise, the authors do not report the equivalent analysis for the SSAEP in the auditory conditions. Regarding the omission of the LHB in Fig. 4, I’m not so certain about the behaviour of the LHB in the auditory conditions. It seems possible that the LHB shows different behaviour with different modalities. There is no mention of this in the text, though.

2) Is there a common threshold level?

The authors argue that the LHB and CCP reached a common threshold level just before response initiation (a prediction of accumulation to bound models, Fig. 1c), but the used test does not entirely convince me: They compared the variance just before response initiation with the variance of measurements across different time points (they randomly assigned the RT of one trial to another trial and computed variance of measurements at the shuffled time points). For a strongly varying function of time, it is no surprise that the measurements at a consistent time point vary less than the measurements made across many different time points as long as the measurement noise is small enough. Based on this argument, it is strange that they did not find a significant difference for the SSVEP which also varies strongly across time (this fits into their interpretation, though), but this lack of difference could be explained by larger measurement noise associated with the SSVEP.

Furthermore, the authors report themselves that they found a significant difference between the size of CPP peaks around decision time for varying contrast levels (Fig. 2c). Especially, the CPP peak for false alarms (no contrast change, but participant response) was lower than the other peaks. If the CPP really is the decision variable predicted by the models, then these differences should not have occurred. So where do they come from? The authors provide arguments that I cannot follow without further explanations.

3) timing of peaks

It appears that the mean reaction time precedes the peaks of the mean signals slightly. The effect is particularly clear in Fig. 3b (CPP), Fig. 4d (CPP) and Fig. 5a, but is also slightly visible in the averages centred at the time of response in Figs. 1c and 2c. Presuming a delay from internal decision time to actual response, the time of the peak of the decision variable should precede the reaction time, especially when reaction time is measured from button presses (here) compared to saccade initiation (typical monkey experiments). So why does it here appear to be the other way round?

4) variance of SSVEP baseline

The SSVEP in Fig. 4a is in a different range (1.0-1.3) than the SSVEP in Fig. 4d (1.7-2.5) even though the two plots should each contain a time course for the same experimental condition. Where does the difference come from?

5) multiple alternatives

The CPP, as described by the authors, is a single, global signal of a decision variable. If the decision problem is composed of only two decision alternatives, a single decision variable is indeed sufficient for decision making, but if more alternatives are considered, several evidence accumulating variables are needed. What would the CPP then signal? One of the decision variables? The total amount of certainty of the upcoming decision?

Conclusion

I do like the results in the paper. If they hold up, the CPP may provide a high temporal resolution window into the decision processes of humans. As a result, it may allow us to investigate decision processes for more complex situations than those which animals can master, but maybe it’s only a signal for the simple, perceptual decisions investigated here. Based on the above open questions I also guess that the reported signals were noisier than the plots make us belief and the correspondence of the CPP with theoretical decision variables should be further examined.

Causal role of dorsolateral prefrontal cortex in human perceptual decision making.

Philiastides, M. G., Auksztulewicz, R., Heekeren, H. R., and Blankenburg, F.
Curr Biol, 21:980–983, 2011
DOI, Google Scholar

Abstract

The way that we interpret and interact with the world entails making decisions on the basis of available sensory evidence. Recent primate neurophysiology [1-6], human neuroimaging [7-13], and modeling experiments [14-19] have demonstrated that perceptual decisions are based on an integrative process in which sensory evidence accumulates over time until an internal decision bound is reached. Here we used repetitive transcranial magnetic stimulation (rTMS) to provide causal support for the role of the dorsolateral prefrontal cortex (DLPFC) in this integrative process. Specifically, we used a speeded perceptual categorization task designed to induce a time-dependent accumulation of sensory evidence through rapidly updating dynamic stimuli and found that disruption of the left DLPFC with low-frequency rTMS reduced accuracy and increased response times relative to a sham condition. Importantly, using the drift-diffusion model, we show that these behavioral effects correspond to a decrease in drift rate, a parameter describing the rate and thereby the efficiency of the sensory evidence integration in the decision process. These results provide causal evidence linking the DLPFC to the mechanism of evidence accumulation during perceptual decision making.

Review

They apply repetitive TMS to the dorsolateralprefrontal cortex (DLPFC) assuming that this inhibits the decision making ability of subjects, because DLPFC has been shown to be involved in perceptual decision making. Indeed, they find a significant effect of TMS vs. SHAM on the responses of subjects (after TMS responses of subjects are less accurate and take longer). They also argue that the effect is particular to TMS, because it reduces over time, but I wonder why they did not compute the corresponding interaction (they just report that the effect of TMS vs. SHAM is significant earlier, but not significant later).

Furthermore, they hypothesised that TMS disrupted the accumulation process of noisy evidence over time by decreasing the rate of evidence increase. This is based on the previous finding that the DLPFC has higher BOLD activation for less noisy stimuli which suggests that, when DLPFC is disrupted, the evidence coming from less noisy stimuli cannot be optimally processed anymore.

They investigated the evidence accumulation hypothesis by fitting a drift-diffusion model (DDM) to response data. The DDM has more parameters than are necessary to explain the variations of response data for the different experimental conditions. Hence, they use the Bayesian information criterion (BIC) to select parameters which should be fitted for each experimental condition separately, i.e., to be able to say which parameters are affected by the experimental manipulations. The other parameters are still fitted but to all data across experimental conditions. The problem is that the BIC is a very crude approximation just taking the number of freely varying parameters into account. For example, an assumption underlying the BIC is that the Hessian of the likelihood evaluated at the fitted parameter values has full rank (Bishop, 2006, p. 217), but for highly correlated parameters this may not be the case. The used DMAT fitting toolbox actually approximates the Hessian matrix, checks whether a local minimum has been found (instead of a valley) and computes confidence intervals from the approximated Hessian, but the authors report no results for this apart from error bars on the plot for drift rate and nondecision time.

Anyway, the BIC analysis conveniently indicates that drift rate and nondecision time best explain the variations in response data across conditions. However, it has to be kept in mind that these results have been obtained by (presumably) assuming that the diffusion is fixed across conditions which is the standard when fitting a DDM [private correspondence with Rafal Bogacz, 09/2012], because drift rate, diffusion and threshold are redundant (a change in one of them can be reverted by a suitable change in the others). The interpretation of the BIC analysis probably should be that drift rate and nondecision time are the smallest set of parameters which still allow a good fit of the data given that diffusion is fixed.

You need to be careful when interpreting the fitted parameter values in the different conditions. In particular, fitting a DDM to data assumes that the evidence accumulation still works like a DDM, just with different parameters. However, it is not clear what TMS does to the affected processes in the brain. Hence, we can only say from the fitting results that TMS has an effect which is equivalent to a reduction of the drift rate (no clear effect on nondecision time) in a normally functioning DDM.

Similarly, the interpretation of the results for nondecision time is not straightforward. There, the main finding is that nondecision time decreases for high-evidence stimuli which the authors interpret as a reduced time of low-level sensory processing which provides input to evidence accumulation. However, it should be kept in mind that the total amount of time necessary to make a decision is also reduced for high-evidence stimuli. Also, part of the processes which are collected under ‘nondecision time’ may actually work in parallel to evidence accumulation, e.g., movement preparation. If you look at the percentage of RT that is explained by the nondecision time, then the picture is reversed: for high-evidence stimuli nondecision time explains about 82% of RTs while for low-evidence stimuli it explains only about 75% which is consistent with the basic idea that evidence accumulation takes longer for noisier stimuli. In general, these percentages are surprisingly high. Does the evidence accumulation really only account for about 25% of total RTs? But it’s good that we have a number to compare now.

So what do these findings mean for the DLPFC? We cannot draw any definite conclusions. The hypothesis that TMS over DLPFC affects drift rate is somewhat built into the analysis, because the authors use a DDM to fit the responses. Of course, other parameters could have been affected stronger such that the finding of the BIC analysis that drift rate explains the changes best can indeed be taken as evidence for the drift rate hypothesis. However, it is not possible to exclude other explanations which lie outside the parameter space of the DDM. What, for example, if the DLPFC has indeed a somewhat attentional effect on evidence accumulation in the sense that it not only accumulates evidence, but also modulates how big the individual peaces of evidence are by modulating lower-level sensory processing? Then, interrupting the DLPFC may still have a similar effect as observed here, but the interpretation of the role of the DLPFC would be slightly different. Actually, the authors argue against a role of the DLPFC (at least the part of DLPFC they found) in attentional processing, but I’m not entirely convinced. Their main argument is based on the assumption that a top-down attentional effect of the DLPFC on low-level sensory processing would increase the nondecision time, but this is not necessarily true. A) there is the previously mentioned issue of parallel processing and the general problems of fitting a standard model to a disturbed process which makes me doubt the reliability of the fitted nondecision times and B) I can easily conceive a system in which attentional modulation would not delay low-level sensory processing.

Perceptions as hypotheses: saccades as experiments.

Friston, K., Adams, R. A., Perrinet, L., and Breakspear, M.
Front Psychol, 3:151, 2012
DOI, Google Scholar

Abstract

If perception corresponds to hypothesis testing (Gregory, 1980); then visual searches might be construed as experiments that generate sensory data. In this work, we explore the idea that saccadic eye movements are optimal experiments, in which data are gathered to test hypotheses or beliefs about how those data are caused. This provides a plausible model of visual search that can be motivated from the basic principles of self-organized behavior: namely, the imperative to minimize the entropy of hidden states of the world and their sensory consequences. This imperative is met if agents sample hidden states of the world efficiently. This efficient sampling of salient information can be derived in a fairly straightforward way, using approximate Bayesian inference and variational free-energy minimization. Simulations of the resulting active inference scheme reproduce sequential eye movements that are reminiscent of empirically observed saccades and provide some counterintuitive insights into the way that sensory evidence is accumulated or assimilated into beliefs about the world.

Review

In this paper Friston et al. introduce the notion that an agent (such as the brain) minimizes uncertainty about its state in the world by actively sampling those states which minimise the uncertainty of the agent’s posterior beliefs, when visited some time in the future. The presented ideas can also be seen as reply to the commonly formulated dark-room-critique of Friston’s free energy principle which states that under the free energy principle an agent would try to find a dark, stimulus-free room in which sensory input can be perfectly predicted. Here, I review these ideas together with the technical background (see also a related post about Friston et al., 2011). Although I find the presented theoretical argument very interesting and sound (and compatible with other proposals for the origin of autonomous behaviour), I do not think that the presented simulations conclusively show that the extended free energy principle as instantiated by the particular model chosen in the paper leads to the desired exploratory behaviour.

Introduction: free energy principle and the dark room

Friston’s free energy principle has gained considerable momentum in the field of cognitive neuroscience as a unifying framework under which many cognitive phenomena may be understood. Its main axiom is that an agent tries to minimise the long-term uncertainty about its state in the world by executing actions which make prediction of changes in the agent’s world more precise, i.e., which minimise surprises. In other words, the agent tries to maintain a sort of homeostasis with its environment.

While homeostasis is a concept which most people happily associate with bodily functions, it is harder to reconcile with cognitive functions which produce behaviour. Typically, the counter-argument for the free energy principle is the dark-room-problem: changes in a dark room can be perfectly predicted (= no changes), so shouldn’t we all just try to lock ourselves into dark rooms instead of frequently exploring our environment for new things?

The shortcoming of the dark-room-problem is that an agent cannot maintain homeostasis in a dark room, because, for example, its bodily functions will stop working properly after some time without water. There may be many more environmental factors which may disturb the agent’s dark room pleasure. An experienced agent knows this and has developed a corresponding model about its world which tells it that the state of its world becomes increasingly uncertain as long as the agent only samples a small fraction of the state space of the world, as it is the case when you are in a dark room and don’t notice what happens outside of the room.

The present paper formalises this idea. It assumes that an agent only observes a small part of the world in its local surroundings, but also maintains a more comprehensive model of its world. To decrease uncertainty about the global state of the world, the agent then explores other parts of the state space which it beliefs to be informative according to its current estimate of the global world state. In the remainder I will present the technical argument in more detail, discuss the supporting experiments and conclude with my opinion about the presented approach.

Review of theoretical argument

In previous publications Friston postulated that agents try to minimise the entropy of the world states which they encounter in their life and that this minimisation is equivalent to minimising the entropy of their sensory observations (by essentially assuming that the state-observation mapping is linear). The sensory entropy can be estimated by the average of sensory surprise (negative model evidence) across (a very long) time. So the argument goes that an agent should minimise sensory surprise at all times. Because sensory surprise cannot usually be computed directly, Friston suggests a variational approximation in which the posterior distribution over world states (posterior beliefs) and model parameters is separated. Further, the posterior distributions are approximated with Gaussian distributions (Laplace approximation). Then, minimisation of surprise is approximated by minimisation of Friston’s free energy. This minimisation is done with respect to the posterior over world states and with respect to action. The former corresponds to perception and ensures that the agent maintains a good estimate of the state of the world and the latter implements how the agent manipulates its environment, i.e., produces behaviour. While the former is a particular instantiation of the Bayesian brain hypothesis, and hence not necessarily a new idea, the latter had not previously been proposed and subsequently spurred some controversy (cf. above).

At this point it is important to note that the action variables are defined on the level of primitive reflex arcs, i.e., they directly control muscles in response to unexpected basic sensations. Yet, the agent can produce arbitrary complex actions by suitably setting sensory expectations which can be done via priors in the model of the agent. In comparison with reinforcement learning, the priors of the agent about states of the world (the probability mass attributed by the prior to the states), therefore, replace values or costs. But how does the agent choose its priors? This is the main question addressed by the present paper, however, only in the context of a freely exploring (i.e., task-free) agent.

In this paper, Friston et al. postulate that an agent minimises the joint entropy of world states and sensory observations instead of only the entropy of world states. Because the joint entropy is the sum of sensory entropy and conditional entropy (world states conditioned on sensory observations), the agent needs to implement two minimisations. The minimisation of sensory entropy is exactly the same as before implementing perception and action. However, conditional entropy is minimised with respect to the priors of the agent’s model, implementing higher-level action selection.

In Friston’s dynamic free energy framework (and other filters) priors correspond to predictive distributions, i.e., distributions over the world states some time in the future given their current estimate. Friston also assumes that the prior densities are Gaussian. Hence, priors are parameterised by their mean and covariance. To manipulate the probability mass attributed by the prior to the states he, thus, has to change prior mean or covariance of the world states. In the present paper the authors use a fixed covariance (as far as I can tell) and implement changes in the prior by manipulating its mean. They do this indicrectly by introducing new, independent control variables (“controls” from here on) which parameterise the dynamics of the world states without having a dynamics associated with themselves. The controls are treated like the other hidden variables in the agent model and their values are inferred from the sensory observations via free energy minimisation. However, I guess, that the idea is to more or less fix the controls to their prior means, because the second entropy minimisation, i.e., minimisation of the conditional entropy, is with respect to these prior means. Note that the controls are pretty arbitrary and can only be interpreted once a particular model is considered (as is the case for the remaining variables mentioned so far).

As with the sensory entropy, the agent has no direct access to the conditional entropy. However, it can use the posterior over world states given by the variational approximation to approximate the conditional entropy, too. In particular, Friston et al. suggest to approximate the conditional entropy using a predictive density which looks ahead in time from the current posterior and which they call counterfactual density. The entropy of this counterfactual density tells the agent how much uncertainty about the global state of the world it can expect in the future based on its current estimate of the world state. The authors do not specify how far in the future the counterfactual density looks. They here use the denotational trick to call negative conditional entropy ‘saliency’ to make the correspondence between the suggested framework and experimental variables in their example more intuitive, i.e., minimisation of conditional entropy becomes maximisation of saliency. The actual implementation of this nonlinear optimisation is computationally demanding. In particular, it will be very hard to find global optima using gradient-based approaches. In this paper Friston et al. bypass this problem by discretising the space spanned by the controls (which are the variables with respect to which they optimise), computing conditional entropy at each discrete location and simply selecting the location with minimal entropy, i.e., they do grid search.

In summary, the present paper extends previous versions of Friston’s free energy principle by adding prior selection, or, say, high-level action, to perception and action. This is done by adding new control variables representing high-level actions and setting these variables using a new optimisation which minimises future uncertainty about the state of the world. The descriptions in the paper implicitly suggest that the three processes happen sequentially: first the agent perceives to get the best estimate of the current world state, then it produces action to take the world state closer to its expectations and then it reevaluates expectations and thus sets high-level actions (goals). However, Friston’s formulations are in continuous time such that all these processes supposedly happen in parallel. For perception and action alone this leads to unexpected interactions. (Do you rather perceive the true state of the world as it is, or change it such that it corresponds to your expectations?) Adding control variables certainly doesn’t reduce this problem, if their values are inferred (perceived), too, but if perception cannot change them, only action can reduce the part of free energy contributed by them, thereby disentangling perception and action again. Therefore, the new control variables may be a necessary extension, if used properly. To me, it does not seem plausible that high-level actions are reevaluated continuously. Shouldn’t you wait until, e.g., a goal is reached? Such a mechanism is still missing in the present proposal. Instead the authors simply reevaluate high-level actions (minimise conditional entropy with respect to control variable priors) at fixed, ad-hoc intervals spanning sufficiently large amounts of time.

Review of presented experiments (saccade model)

To illustrate the theoretical points, Friston et al. present a model for saccadic eye movements. This model is very basic and is only supposed to show in principle that the new minimisation of conditional entropy can provide sensible high-level action. The model consists of two main parts: 1) the world, which defines how sensory input changes based on the true underlying state of the world and 2) the agent, which defines how the agent believes the world behaves. In this case, the state of the world is the position in a viewed image which is currently fixated by the eye of the agent. This position, hence, determines what input the visual sensors of the agent currently get (the field of view around the fixation position is restricted), but additionally there are proprioceptive sensors which give direct feedback about the position. Action changes the fixation position. The agent has a similar, but extended model of the world. In it, the fixation position depends on the hidden controls. Additionally, the model of the agent contains several images such that the agent has to infer what image it sees based on its sensory input.

In Friston’s framework, inference results heavily depend on the setting of prior uncertainties of the agent. Here, the agent is assumed to have certain proprioception, but uncertain vision such that it tends to update its beliefs of what it sees (which image) rather than trying to update its beliefs of where it looks. [I guess, this refers to the uncertainties of the hidden states and not the uncertainties of the actual sensory input which was probably chosen to be quite certain. The text does not differentiate between these and, unfortunately, the code was not yet available within the SPM toolbox at the time of writing (08.09.2012).]

As mentioned above, every 16 time steps the prior for the hidden controls of the agent is recomputed by minimising the conditional entropy of the hidden states given sensory input (minimising the uncertainty over future states given the sensory observations up to that time point). This is implemented by defining a grid of fixation positions and computing the entropy of the counterfactual density (uncertainty of future states) while setting the mean of the prior to one of the positions. In effect, this translates for the agent into: ‘Use your internal model of the world to simulate how your estimate of the world will change when you execute a particular high-level action. (What will be your beliefs about what image you see, when fixating a particular position?) Then choose the high-level action which reduces your uncertainty about the world most. (Which position gives you most information about what image you see?)’ Up to here, the theoretical ideas were self-contained and derived from first principles, but then Friston et al. introduce inhibition of return to make their results ‘more realistic’. In particular, they introduce an inhibition of return map which is a kind of fading memory of which positions were previously chosen as saccade targets and which is subtracted from the computed conditional entropy values. [The particular form of the inhibition of return computations, especially the initial substraction of the minimal conditional entropy value, is not motivated by the authors.]

For the presented experiments the authors use an agent model which contains three images as hypotheses of what the agent observes: a face and its 90° and 180° rotated versions. The first experiment is supposed to show that the agent can correctly infer which image it observes by making saccades to low conditional entropy (‘salient’) positions. The second experiment is supposed to show that, when an image is observed which is unknown to the agent, the agent cannot be certain of which of the three images it observes. The third experiment is supposed to show that the uncertainty of the agent increases when high entropy high-level actions are chosen instead of low entropy ones (when the agent chooses positions which contain very little information). I’ll discuss them in turn.

In the first experiment, the presented posterior beliefs of the agent about the identity of the observed image show that the agent indeed identifies the correct image and becomes more certain about it. Figure 5 of the paper also shows us the fixated positions and inhibition of return adapted conditional entropy maps. The presented ‘saccadic eye movements’ are misleading: the points only show the stabilised fixated positions and the lines only connect these without showing the large overshoots which occur according to the plot of ‘hidden (oculomotor) states’. Most critically, however, it appears that the agent already had identified the right image with relative certainty before any saccade was made (time until about 200ms). The results, therefore, do not clearly show that the saccade selection is beneficial for identifying the observed image, also because the presented example is only a single trial with a particular initial fixation point and with a noiseless observed image. Also, because the image was clearly identified very quickly, my guess is that the conditional entropy maps would be very similar after each saccade without inhibition of return, i.e., always the same fixation position would be chosen and no exploratory behaviour (saccades) would be seen, but this needs to be confirmed by running the experiment without inhibition of return. My overall impression of this experiment is that it presents a single, trivial example which does not allow me to draw general conclusions about the suggested theoretical framework.

The second experiment acts like a sanity check: the agent shouldn’t be able to identify one of its three images, when it observes a fourth one. Whether the experiment shows that, depends on the interpretation of the inferred hidden states. The way these states were defined their values can be directly interpreted as the probability of observing one of the three images. If only these are considered the agent appears to be very certain at times (it doesn’t help that the scale of the posterior belief plot in Figure 6 is 4 times larger than that of the same plot in Figure 5). However, the posterior uncertainty directly associated with the hidden states appears to be indeed considerably larger than in experiment 1, but, again, this is only a single example. Something that is rather strange: the sequence of fixation positions is almost exactly the same as in experiment 1 even though the observed image and the resulting posterior beliefs were completely different. Why?

Finally, experiment three is more like a thought experiment: what would happen, if an agent chooses high-level actions which maximise future uncertainty instead of minimising it. Well, the uncertainty of the agent’s posterior beliefs increases as shown in Figure 7, which is the expected behaviour. One thing that I wonder, though, and it applies to the presented results of all experiments: In Friston’s Bayesian filtering framework the uncertainty of the posterior hidden states is a direct function of their mean values. Hence, as long as the mean values do not change, the posterior uncertainty should stay constant, too. However, we see in Figure 7 that the posterior uncertainty increases even though the posterior means stay more or less constant. So there must be an additional (unexplained) mechanism at work, or we are not shown the distribution of posterior hidden states, but something slightly different. In both cases, it would be important to know what exactly resulted in the presented plots to be able to interpret the experiments in the correct way.

Conclusion

The paper presents an important theoretical extension to Friston’s free energy framework. This extension consists of adding a new layer of computations which can be interpreted as a mechanism for how an agent (autonomously) chooses its high-level actions. These high-level actions are defined in terms of desired future states encoded by the probability mass which is assigned to these states by the prior state distribution. Conceptually, these ideas translate into choosing maximally informative actions given the agent’s model of the world and its current state estimate. As discussed by Friston et al. such approaches to action selection are not new (see also Tishby and Polani, 2011). So, the author’s contribution is to show that these ideas are compatible with Friston’s free energy framework. Hence, on the abstract, theoretical level this paper makes sense. It also provides a sound theoretical argument for why an agent would not seek sensory deprivation in a dark room, as feared by critics of the free energy principle. However, the presented framework heavily relies on the agent’s model of the world and it leaves open how the agent has attained this model. Although the free energy principle also provides a way for the agent to learn parameters of its model, I still, for example, haven’t seen a convincing application in which the agent actually learnt the dynamics of an unknown process in the world. Probably Friston would here also refer to evolution as providing a good initialisation for process dynamics, but I find that a too cheap way out.

From a technical point of view the paper leaves a few questions open, for example: How far does the counterfactual distribution look into the future? What does it mean for high-level actions to change how far the agent looks into his subjective future? How well does the presented approach scale? Is it important to choose the global minimum of the conditional entropy (this would be bad, as it’s probably extremely hard to find in a general setting)? When, or how often, does the agent minimise conditional entropy to set high-level actions? What happens with more than one control variables (several possible high-level actions)? How can you model discrete high-level actions in Friston’s continuous Gaussian framework? How do results depend on the setting of prior covariances / uncertainties. And many more.

Finally, I have to say that I find the presented experiments quite poor. Although providing the agent with a limited field of view such that it has to explore different regions of a presented image is a suitable setting to test the proposed ideas, the trivial example and introduction of ad-hoc inhibition of return make it impossible to judge whether the underlying principle is successfully at work, or the simulations have been engineered to work in this particular case.

Inhibitory plasticity balances excitation and inhibition in sensory pathways and memory networks.

Vogels, T. P., Sprekeler, H., Zenke, F., Clopath, C., and Gerstner, W.
Science, 334:1569–1573, 2011
DOI, Google Scholar

Abstract

Cortical neurons receive balanced excitatory and inhibitory synaptic currents. Such a balance could be established and maintained in an experience-dependent manner by synaptic plasticity at inhibitory synapses. We show that this mechanism provides an explanation for the sparse firing patterns observed in response to natural stimuli and fits well with a recently observed interaction of excitatory and inhibitory receptive field plasticity. The introduction of inhibitory plasticity in suitable recurrent networks provides a homeostatic mechanism that leads to asynchronous irregular network states. Further, it can accommodate synaptic memories with activity patterns that become indiscernible from the background state but can be reactivated by external stimuli. Our results suggest an essential role of inhibitory plasticity in the formation and maintenance of functional cortical circuitry.

Review

The authors show that, if the same input to an output neuron arrives through an excitatory and a delayed inhibitory channel, synaptic plasticity (a symmetric STDP rule) at the inhibitory synapses leads to “detailed balance”, i.e., to cancellation of excitatory and inhibitory input currents. Then, the output neuron fires sparsely and irregularly (as observed for real neurons) only when an excitatory input was not predicted by the implicit model encoded by the synaptic weights of the inhibitory inputs. The adaptation of the inhibitory synapses also matches potential changes in the excitatory synapses, although here they only present simulations in which excitatory synapses changed abruptly and stayed constant afterwards. (What happens when excitatory and inhibitory synapses change concurrently?) Finally, the authors show that similar results apply to recurrently connected networks of neurons with dedicated inhibitory neurons (balanced networks). Arbitrary activity patterns can be encoded by the excitatory connections, activity in these patterns is then suppressed by the inhibitory neurons, while partial activation of the patterns through external input reactivates the whole patterns (cf. recall of memory) without suppressing potential reactivation of other patterns in the network.

These are interesting ideas, clearly presented and with very detailed supplementary information. The large number of inhibitory neurons in cortex makes the assumed pairing of excitatory and inhibitory input at least possible, but I don’t know how prevalent this really is. Another important assumption here is that the inhibitory input is a bit slower than the excitatory input. This makes intuitive sense, if you assume that the inhibitory input needs to be relayed through an additional inhibitory neuron, but I’ve seen the opposite assumption before, too.

Representational switching by dynamical reorganization of attractor structure in a network model of the prefrontal cortex.

Katori, Y., Sakamoto, K., Saito, N., Tanji, J., Mushiake, H., and Aihara, K.
PLoS Comput Biol, 7:e1002266, 2011
DOI, Google Scholar

Abstract

The prefrontal cortex (PFC) plays a crucial role in flexible cognitive behavior by representing task relevant information with its working memory. The working memory with sustained neural activity is described as a neural dynamical system composed of multiple attractors, each attractor of which corresponds to an active state of a cell assembly, representing a fragment of information. Recent studies have revealed that the PFC not only represents multiple sets of information but also switches multiple representations and transforms a set of information to another set depending on a given task context. This representational switching between different sets of information is possibly generated endogenously by flexible network dynamics but details of underlying mechanisms are unclear. Here we propose a dynamically reorganizable attractor network model based on certain internal changes in synaptic connectivity, or short-term plasticity. We construct a network model based on a spiking neuron model with dynamical synapses, which can qualitatively reproduce experimentally demonstrated representational switching in the PFC when a monkey was performing a goal-oriented action-planning task. The model holds multiple sets of information that are required for action planning before and after representational switching by reconfiguration of functional cell assemblies. Furthermore, we analyzed population dynamics of this model with a mean field model and show that the changes in cell assemblies’ configuration correspond to those in attractor structure that can be viewed as a bifurcation process of the dynamical system. This dynamical reorganization of a neural network could be a key to uncovering the mechanism of flexible information processing in the PFC.

Review

Based on firing properties of certain prefrontal cortex neurons the authors suggest a network model in which short-term plasticity implements switches of what the neurons in the network represent. In particular, neurons in prefrontal cortex have been found which switch from representing goals to representing actions (first, their firing varies depending on which goal is shown, then it varies depending on which action is executed afterwards while firing equally for all goals). The authors call this representational switches and they assume that these are implemented via changes in the connection strengths of neurons in a recurrently connected neural network. The network is setup such that network activity always converges to one of several fixed point attractors. A suitable change in connection strengths then leads to a change in the attractor landscape which may be interpreted as a change in what the network represents. The main contribution of the authors is to suggest a particular pattern of short-term plasticity at synapses in the network such that the network exhibits the desired representational switching. Another important aspect of this model is its structure: the network consists of separate cell assemblies, different subsets of which are assumed to be active when either goals or actions are represented and the goal and action subsets are partially overlapping. For example, in their model they have four cell assemblies (A,B,C,D) and the subsets (A,B) and (C,D) are associated with goals while subsets (A,D) and (B,C) are associated with actions. Initially the network is assumed to be in the goal state in which the connection strenghts A-B and C-D are large. The presentation of one of two goals then makes the network activity converge to strong activation of (A,B) or (C,D). Synaptic depression of connections A-B (assuming that this is the active subset) with simultaneous facilitation of connections A-D and B-C then leads to the desired change of connection strengths which implements the representational switch and then makes either subset (A-D), or subset (B-C) the active subset. It is not entirely clear to me why only one action subset becomes active. Maybe this is what the inhibitory units in the model are for (their function is not explained by the authors). In further analysis and experiments the authors confirm the attractor landscape of the model (and how it changes), show that the timing of the representational switch can be influenced by input to the network and show that the probability of changing from a particular goal to a particular action can be manipulated by changing the number of prior connections between the corresponding cell assemblies.

The authors show a nice qualitative correspondence between experimental findings and simulated network behaviour (although some qualitative differences are left, too, e.g., a general increase of firing also for the non-preferred goal and action in the experimental findings). In essence, the authors present a mechanism which could implement the (seemingly) autonomous switching of representations in prefrontal cortex neurons. Whether this mechanism is used by the brain is an entirely different question. I don’t know of evidence backing the chosen special wiring of neurons and distribution of short-term placticity, but this might just reflect my lack of knowledge of the field. Additionally, I wouldn’t exclude the possibility of a hierarchical model. The authors argue against this by presuming that prefrontal cortex already should be the top of the hierarchy, but nothing prevents us to make hierarchical models of prefrontal cortex itself. This points to the mixing of levels of description in the paper: On the one hand, the main contributions of the paper are on the algorithmic level describing the necessary wiring in a network of a few units and how it needs to change to reproduce the behaviour observed in experiments. On the other hand, the main model is on an implementational level showing how these ideas could be implemented in a network of leaky integrate and fire (LIF) neurons. In my opinion, the LIF neuron network doesn’t add anything interesting to the paper apart from the proof that the algorithmic ideas can be implemented by such a network. On the contrary, it masks a bit the main points of the paper by introducing an abundance of additional parameters which needed to be chosen by the authors, but for which we don’t know which of these settings are important. Finally, I wonder how the described network is reset in order to be ready for the next trial. The problem is the following: the authors initialise the network such that the goal subsets have a high synaptic efficacy at the start of the trial. The short-term plasticity then reduces these synaptic efficacies while simultaneously increasing those of the action subsets. At the end of a trial they all end up in a similar range (see Fig. 3A bottom). In order for the network to work as expected in the next trial, it somehow needs to reset to the initial synaptic efficacies.

Probabilistic population codes for Bayesian decision making.

Beck, J. M., Ma, W. J., Kiani, R., Hanks, T., Churchland, A. K., Roitman, J., Shadlen, M. N., Latham, P. E., and Pouget, A.
Neuron, 60:1142–1152, 2008
DOI, Google Scholar

Abstract

When making a decision, one must first accumulate evidence, often over time, and then select the appropriate action. Here, we present a neural model of decision making that can perform both evidence accumulation and action selection optimally. More specifically, we show that, given a Poisson-like distribution of spike counts, biological neural networks can accumulate evidence without loss of information through linear integration of neural activity and can select the most likely action through attractor dynamics. This holds for arbitrary correlations, any tuning curves, continuous and discrete variables, and sensory evidence whose reliability varies over time. Our model predicts that the neurons in the lateral intraparietal cortex involved in evidence accumulation encode, on every trial, a probability distribution which predicts the animal’s performance. We present experimental evidence consistent with this prediction and discuss other predictions applicable to more general settings.

Review

In this article the authors apply probabilistic population coding as presented in Ma et al. (2006) to perceptual decision making. In particular, they suggest a hierarchical network with a MT and LIP layer in which the firing rates of MT neurons encode the current evidence for a stimulus while the firing rates of LIP neurons encode the evidence accumulated over time. Under the made assumptions it turns out that the accumulated evidence is independent of nuisance parameters of the stimuli (when they can be interpreted as contrasts) and that LIP neurons only need to sum (integrate) the activity of MT neurons in order to represent the correct posterior of the stimulus given the history of evidence. They also suggest a readout layer implementing a line attractor which reads out the maximum of the posterior under some conditions.

Details

Probabilistic population coding is based on the definition of the likelihood of stimulus features p(r|s,c) as an exponential family distribution of firing rates r. A crucial requirement for the central result of the paper (that LIP only needs to integrate the activity of MT) is that nuisance parameters c of the stimulus s do not occur in the exponential itself while the actual parameters of s only occur in the exponential. This restricts the exponential family distribution to the “Poisson-like family”, as they call it, which requires that the tuning curves of the neurons and their covariance are proportional to the nuisance parameters c (details for this need to be read up in Ma et al., 2006). The point is that this is the case when c corresponds to contrast, or gain, of the stimulus. For the considered random dot stimuli the coherence of the dots may indeed be interpreted as the contrast of the motion in the sense that I can imagine that the tuning curves of the MT neurons are multiplicatively related to the coherence of the dots.

The probabilistic model of the network activities is setup such that the firing of neurons in the network is an indirect, noisy observation of the underlying stimulus, but what we are really interested in is the posterior of the stimulus. So the question is how you can estimate this posterior from the network firing rates. The trick is that under the Poisson-like distribution the likelihood and posterior share the same exponential such that the posterior becomes proportional to this exponential, because the other parts of the likelihood do not depend on the stimulus s (they assume a flat prior of s such that you don’t need to consider it when computing the posterior). Thus, the probability of firing in the network is determined from the likelihood while the resulting firing rates simultaneously encode the posterior. Mind-boggling. The main contribution from the authors then is to show, assuming that firing rates of MT neurons are driven from the stimulus via the corresponding Poisson-like likelihood, that LIP neurons only need to integrate the spikes of MT neurons in order to correctly represent the posterior of the stimulus given all previous evidence (firing of MT neurons). Notice, that they also assume that LIP neurons have the same tuning curves with respect to the stimulus as MT neurons and that the neurons in LIP sum the activity of this MT neuron which they share a tuning curve with. They note that a naive procedure like that, i.e. a single neuron integrating MT firing over time, would quickly saturate its activity. So they show, and that is really cool, that global inhibition in the LIP network does not affect the representation of the posterior, allowing them to prevent saturation of firing while maintaining the probabilistic interpretation.

So far to the theory. In practice, i.e. experiments, the authors do something entirely different, because “these results are important, but they are based on assumptions that are not necessarily exactly true in vivo. […] It is therefore essential that we test our theory in biologically realistic networks.” Now, this is a noble aim, but what exactly do we learn about this theory, if all results are obtained using methods which violate the assumptions of the theory? For example, neither the probability of firing in MT nor LIP is Poisson-like, LIP neurons not just integrate MT activity, but are also recurrently connected, LIP neurons have local inhibition (they are leaky integrators, inhibition between LIP neurons depending on tuning properties) instead of global inhibition and LIP neurons have an otherwise completely unmotivated “urgency signal” whose contribution increases with time (this stems from experimental observations). Without any concrete links between the two models in theory (I guess, the main ideas are similar, but the details are very different) it has to be shown that they are similar using experimental results. In any case, it is hard to differentiate between contributions from the probabilistic theory and the network implementation, i.e., how much of the fit between experimental findings in monkeys and the behaviour of the model is due to the chosen implementation and how much is due to the probabilistic interpretation?

Results

The overall aim of the experiments / simulations in the paper is to show that the proposed probabilistic interpretation is compatible with the experimental findings in monkey LIP. The hypothesis is that LIP neurons encode the posterior of the stimulus as suggested in the theory. This hypothesis is false from the start, because some assumptions of the theory apparently don’t apply to neurons (as acknowledged by the authors). So the new hypothesis is that LIP neurons approximately encode some posterior of the stimulus. The requirement for this posterior is that updates of the posterior should take the uncertainty of the evidence and the uncertainty of the previous estimate of the posterior into account which the authors measure as a linear increase of the log odds of making a correct choice, log[ p(correct) / (1-p(correct)) ], with time together with the dependence of the slope of this linear increase on the coherence (contrast) of the stimulus. I did not follow up why the previous requirement is related to the log odds in this way, but it sounds ok. Remains the question how to estimate the log odds from simulated and real neurons. For the simulated neurons the authors approximate the likelihood with a Poisson-like distribution whose kernel (parameters) were estimated from the simulated firing rates. They argue that it is a good approximation, because linear estimates of the Fisher information appear to be sufficient (I can’t comment on the validity of this argument). A similar approximation of the posterior cannot be done for real LIP neurons, because of a lack of multi-unit recordings which estimate the response of the whole LIP population. Instead, the authors approximate the log odds from measured firing rates of neurons tuned to motion in direction 0 and 180 degrees via a linear regression approach described in the supplemental data.

The authors show that the log-odds computed from the simulated network exhibit the desired properties, i.e., the log-odds linearly increase with time (although there’s a kink at 50ms which supposedly is due to the discretisation) and depend on the coherence of the motion such that the slope of the log-odds increases also when coherence is increased within a trial. The corresponding log-odds of real LIP neurons are far noisier and, thus, do not allow to make definite judgements about linearity. Also, we don’t know whether their slopes would actually change after a change in motion coherence during a trial, as this was never tested (it’s likely, though).

In order to test whether the proposed line attractor network is sufficient to read out the maximum of the posterior in all conditions (readout time and motion coherence) the authors compare a single (global) readout with local readouts adapted for a particular pair of readout time and motion coherence. However, the authors don’t actually use attractor networks in these experiments, but note that these are equivalent to local linear estimators and so use these. Instead of comparing the readouts from these estimators with the actual maximum of the posterior, they only compare the variance of the estimators (Fisher information) which they show to be roughly the same for the local and global estimators. From this they conclude that a single, global attractor network could read out the maximum of the (approximated) posterior. However, this is only true, if there’s no additional bias of the global estimator which we cannot see from these results.

In an additional analysis the authors show that the model qualitatively replicates the behavioural variables (probability correct and reaction time). However, these are determined from the LIP activities in a surprisingly ad-hoc way: the decision time is determined as the time when any one of the simulated LIP neurons reaches a threshold defined on the probability of firing and the decision is determined as the preferred direction of the neuron hitting the threshold (for 2 and 4 choice tasks the response is determined as the quadrant of the motion direction in which the preferred direction of the neuron falls). Why do the authors not use the attractor network to readout the response here? Also, the authors use a lower threshold for the 4-choice task than for the 2-choice task. This is strange, because one of the main findings of the Churchland et al. (2008) paper was that the decision in both, 2- and 4-choice tasks, appears to be determined by a common decision threshold while the initial firing rates of LIP neurons were lower for 4-choice tasks. Here, they also initialise with lower firing rates in the 4-choice task, but additionally choose a lower threshold. They don’t motivate this. Maybe it was necessary to fit the data from Churchland et al. (2008). This discrepancy between data and model is even more striking as the authors of the two papers partially overlap. So, do they deem the corresponding findings of Churchland et al. (2008) not important enough to be modelled, is it impossible to be modelled within their framework, or did they simply forget?

Finally, also the build-up rates of LIP neurons seem to be qualitatively similar in the simulation and the data, although they are consistently lower in the model. The build-up rates for the model are estimated from the first 50ms within each trial. However, the log-odds ratio had this kink at 50ms after which its slope was larger. So, if this effect is also seen directly in the firing rates, the fit of the build-up rates to the data may even be better, if probability of firing after 50ms is used. In Fig. 2C no such kink can be seen in the firing rates, but this is only data for 2 neurons in the model.

Conclusion

Overall the paper is very interesting and stimulating. It is well written and full of sound theoretical results which originate from previous work of the authors. Unfortunately, biological nature does not completely fit the beautiful theory. Consequently, the authors run experiments with more plausible neural networks which only approximately implement the theory. So what conclusions can we draw from the presented results? As long as the firing of MT neurons reflects the likelihood of a stimulus (their MT network is setup in this way), probably a wide range of networks which accumulate this firing will show responses similar to real LIP neurons. It is not easy to say whether this is a consequence of the theory, which states that MT firing rates should be simply summed over time in order to get the right posterior, because of the violation of the assumptions of the theory in more realistic networks. It could also be that more complicated forms of accumulation are necessary such that LIP firing represents the correct posterior. Simple summing then just represents a simple approximation. Also, I don’t believe that the presented results can rule out the possibility of sampling based coding of probabilities (see Fiser et al., 2010) for decision making as long as also the sampling approach would implement some kind of accumulation procedure (think of particle filters – the implementation in a recurrent neural network would probably be quite similar).

Nevertheless, the main point of the paper is that the activity in LIP represents the full posterior and not only MAP estimates or log-odds. Consequently, the model very easily extends to the case of continuous directions of motion which is in contrast to previous, e.g., attractor-based, neural models. I like this idea. However, I cannot determine from the experiments whether their network actually implements the correct posterior, because all their tests yield only indirect measures based on approximated analyses. Even so, it is pretty much impossible to verify that the firing of LIP neurons fits to the simulated results as long as we cannot measure firing of a large part of the corresponding neural population in LIP.

Action understanding and active inference.

Friston, K., Mattout, J., and Kilner, J.
Biol Cybern, 104:137–160, 2011
DOI, Google Scholar

Abstract

We have suggested that the mirror-neuron system might be usefully understood as implementing Bayes-optimal perception of actions emitted by oneself or others. To substantiate this claim, we present neuronal simulations that show the same representations can prescribe motor behavior and encode motor intentions during action-observation. These simulations are based on the free-energy formulation of active inference, which is formally related to predictive coding. In this scheme, (generalised) states of the world are represented as trajectories. When these states include motor trajectories they implicitly entail intentions (future motor states). Optimizing the representation of these intentions enables predictive coding in a prospective sense. Crucially, the same generative models used to make predictions can be deployed to predict the actions of self or others by simply changing the bias or precision (i.e. attention) afforded to proprioceptive signals. We illustrate these points using simulations of handwriting to illustrate neuronally plausible generation and recognition of itinerant (wandering) motor trajectories. We then use the same simulations to produce synthetic electrophysiological responses to violations of intentional expectations. Our results affirm that a Bayes-optimal approach provides a principled framework, which accommodates current thinking about the mirror-neuron system. Furthermore, it endorses the general formulation of action as active inference.

Review

In this paper the authors try to convince the reader that the function of the mirror neuron system may be to provide amodal expectations for how an agent’s body will change, or interact with the world. In other words, they propose that the mirror neuron system represents, more or less abstract, intentions of an agent. This interpretation results from identifying the mirror neuron system with hidden states in a dynamic model within Friston’s active inference framework. I will first comment on the active inference framework and the particular model used and will then discuss the biological interpretation.

Active inference framework:

Active inference has been described by Friston elsewhere (Friston et al. PLoS One, 2009; Friston et al. Biol Cyb, 2010). Note that all variables are continuous. The main idea is that an agent maximises the likelihood of its internal model of the world as experienced by its sensors by (1) updating the hidden states of this model and (2) producing actions on the world. Under the Gaussian assumptions made by Friston both ways to maximise the likelihood of the model are equivalent to minimising the precision-weighted prediction errors defined in the model. Potentially the models are hierarchical, but here only a single layer is used which consists of sensory states and hidden states. The prediction errors on sensory states are simply defined as the difference between sensory observations and sensory predictions from the model as you would intuitively do. The model also defines prediction errors on hidden states (*). Both types of prediction errors are used to infer hidden states (1) which explain sensory observations, but action is only produced (2) from sensory state prediction errors, because action is not part of the agent’s model and only affects sensory observations produced by the world.

Well, actually the agent needs a whole other model for action which implements the gradient of sensory observations with respect to action, i.e., which tells the agent how sensory observations change when it exerts action. However, Friston restricts sensory obervations in this context to proprioceptive observations, i.e., muscle feedback, and argues that the corresponding gradient may be sufficiently simple to learn and represent so that we don’t have to worry about it (in the simulation he just provides the gradient to the agent). Therefore, action solely tries to implement proprioceptive predictions. On the other hand, proprioceptive predictions may be coupled to predictions in other modalities (e.g. vision) through the agent’s model which allows the agent to execute (seemingly) higher-level actions. For example, if an agent sees its hand move from a cup to a glass on a table in front of it, its generative model must also represent the corresponding proprioceptive signals. If then the agent predicts this movement of its hand in visual space, the generative model must automatically predict the corresponding proprioceptive signals, because they always accompanied the seen movement. Action then minimises the resulting precision-weighted proprioceptive prediction error and so implements the hand movement from cup to glass.

Notice that the agent minimises the *precision-weighted* prediction errors. Precision here means the inverse *prior* covariance, i.e., it is a measure for how certain the agent *expects* to be about its observations. By changing the precisions, qualitatively very different results can be obtained within the active inference framework. Indeed, here they implement the switch from action generation to action observation by heavily reducing the precision of the proprioceptive observations. This makes the agent ignore any proprioceptive prediction errors when both updating hidden states (1) and generating action (2). This leads to an interesting prediction: when you observe an action by somebody else, you shouldn’t notice when the corresponding body part is moved externally, or alternatively, when you observe somebody elses movement, you shouldn’t be able to move the corresponding body part yourself (in a different way than the observed). In this strict formulation this prediction appears to be very unlikely, but formulating it more softly, that you should see interference effects in these situations, you may be able to find evidence for it.

This thought also points to the general problem of finding suitable precisions: How do you strike a balance between action (2) and perception (1)? Because they are both trying to reduce the same prediction errors, the agent has to tradeoff recognising the world as it is (1) and changing it so that it corresponds to his expectations (2). This dichotomy is not easily resolved. When asked about it, Friston usually points to empirical priors, i.e., that the agent has learnt to choose suitable precisions based on his past experience (not very helpful, if you want to know how they are chosen). I guess, it’s really a question about how strongly the agent expects (wants) a certain outcome. A useful practical consideration also is that action is constrained, e.g., an agent can’t move infinitely fast, which means that enough prediction error should be left over for perceiving changes in the world (1), in particular those that are not within reach of the agent’s actions on the expected time scale.

I do not discuss the most common reservation against Friston’s free-energy principle / active inference framework (that people seem to have an intrinsic curiosity towards new things as well), because it has been covered elsewhere (John Langford’s blogNature Neuroscience).

Handwriting model:

In this paper the particular model used is interpreted as a model for handwriting although neither a hand is modeled, nor actual writing. Rather, a two-joint system (arm) is used where the movement of the end-effector position (tip) is designed such that it is qualitatively similar to hand-writing without actually producing common letters. The dynamic model of the agent consists of two parts: (a) a stable heteroclinic channel (SHC) which produces a periodic sequence of 6 continuously changing states and (b) a linear attractor dynamics in joint angle space of the arm which is attracted to a rest position, but modulated by the distance of the tip to a desired point in Cartesian space which is determined by the SHC state. Thus, the agent expects that the tip of its arm moves along a sequence of 6 desired points where the dynamics of the arm movement is determined by the linear attractor. The agent observes the joint angle positions and velocities (proprioceptive) and the Cartesian positions of the elbow joint and tip (visual). The dynamic model of the world (so to say implementing the underlying physics) lacks the SHC dynamics and only defines the linear attractor in joint space which is modulated by action and some (unspecified) external variables which can be used to perturb the system. Interestingly, the arm is stronger attracted to its rest position in the world model than in the agent model. The reason for this is not clear to me, but it might not be important, because action could correct for this.

Biological interpretation:

The system is setup such that the agent model contains additional hidden states compared to the world which may be interpreted as intentions of the agent, because they determine the order of the points that the tip moves to. In simulations the authors show that the described models within the active inference framework indeed lead to actions of the agent which implement a “writing” movement even though the world model did not know anything about “writing” at all. This effect has already been shown in the previously mentioned publications.

Here is new that they show that the same model can be used to observe an action without generating action at the same time. As mentioned before, they simply reduce the precision of the proprioceptive observations to achieve this. They then replay the previously recorded actions of the agent in the world by providing them via the external variables. This produces an equivalent movement of the arm in the world without any action being exerted by the agent. Instead of generating its own movement the agent then has the task to recognise a movement executed by somebody/something else. This works, because the precision of the visual obserations was kept high such that the hidden SHC states can be inferred correctly (1). The authors mention a delay before the SHC states catch up with the equivalent trajectory under action. This should not be over-interpreted, because other than mentioned in the text the initial conditions for the two simulations were not the same (see figures and code). The important argument the authors try to make here is that the same set of variables (SHC states) are equally active during action as well as action observation and, therefore, provide a potential functional explanation for activity in the mirror neuron system.

Furthermore, the authors argue that SHC states represent the intentions of the agent, or, equivalently, the intentions of the agent which is observed, by noting that the desired tip positions as specified by the SHC states are only (approximately) reached at a later point in time in the world. This probably results from the inertia built into the joint angle dynamics. Probably there are dynamic models for which this effect disappears, but it sounds plausible to me to assume that when one dynamic system d1 influences the parameters of another dynamic system d2 (as here), that d2 first needs to catch up with its state to the new parameter setting. So these delays would be expected for most hierarchical dynamic systems.

Another line of argument of the authors is to relate prediction errors in the model with electrophysiological (EEG) findings. This is based on Friston’s previous suggestion that superficial pyramidal cells are likely candidates for implementing prediction error units. At the same time, activity of these cells is thought to dominate EEG signals. I cannot judge the validity of both hypothesis, although the former seems to have less experimental support than the latter. In any case, I find the corresponding arguments in this paper quite weak. The problem is that results from exactly one run with one particular setting of parameters of one particular model is used to make very general statements based on a mere qualitative fit of parts of the data to general experimental findings. In other words, I’m not confident that similar (desired) patterns would be seen in the prediction errors, if other settings of precisions, or parameters of the dynamical systems would be chosen.

Conclusion:

The authors suggest how the mirror neuron system can be understood within Friston’s active inference framework. These conceptual considerations make sense. In general, the active inference framework provides large explanatory power and many phenomena may be understood in its context. However, in my point of view, it is an entirely open question how the functional considerations of the active inference framework may be implemented in neurobiological substrate. The superficial arguments based on prediction errors generated by the model, which are presented in the paper, are not convincing. More evidence needs to be found which robustly links variables in an active inference model with neuroscientific measurements.

But also conceptually it is not clear whether the active inference solution correctly describes the computations of the brain. On the one hand, it potentially explains many important and otherwise disparate phenomena under a common principle (e.g. perception, action, learning, computing with noise, dynamics, internal models, prediction; this paper adds action understanding). On the other hand, we don’t know whether all brain functions actually follow a common principle and whether functionally equivalent solutions for subsets of phenomena may be better descriptions of the underlying computations.

An important issue for future studies which aim to discern these possibilities is that active inference is a general framework which needs to be instantiated with a particular model before its properties can be compared to experimental data. However, little is known about the kind of hierarchical, dynamic, functional models itself, which must serve as generative models for active inference. As in this paper, it then is hard to discern the properties of the chosen model from the properties imposed by the active inference framework. Therefore, great care has to be taken in the interpretation of corresponding results, but it would be exciting to learn about which properties of the active inference framework are crucial in brain function and which would need to be added, adapted, or dropped in a faithful description of (subsets of) brain function.

(*) Hidden state prediction errors result from Friston’s special treatment of dynamical systems by extending states by their temporal derivatives to obtain generalised states which represent a local trajectory of the states through time. The hidden state prediction errors, thus, can be seen, intuitively, as the difference between the velocity of the (previously inferred) hidden states as represented by the trajectory in generalised coordinates and the velocity predicted by the dynamic model.

Information Theory of Decisions and Actions.

Tishby, N. and Polani, D.
in: Perception-Action Cycle, Springer New York, pp. 601–636, 2011
URL, Google Scholar

Abstract

The perception–action cycle is often defined as “the circular flow of information between an organism and its environment in the course of a sensory guided sequence of actions towards a goal” (Fuster, Neuron 30:319–333, 2001; International Journal of Psychophysiology 60(2):125–132, 2006). The question we address in this chapter is in what sense this “flow of information” can be described by Shannon’s measures of information introduced in his mathematical theory of communication. We provide an affirmative answer to this question using an intriguing analogy between Shannon’s classical model of communication and the perception–action cycle. In particular, decision and action sequences turn out to be directly analogous to codes in communication, and their complexity – the minimal number of (binary) decisions required for reaching a goal – directly bounded by information measures, as in communication. This analogy allows us to extend the standard reinforcement learning framework. The latter considers the future expected reward in the course of a behaviour sequence towards a goal (value-to-go). Here, we additionally incorporate a measure of information associated with this sequence: the cumulated information processing cost or bandwidth required to specify the future decision and action sequence (information-to-go). Using a graphical model, we derive a recursive Bellman optimality equation for information measures, in analogy to reinforcement learning; from this, we obtain new algorithms for calculating the optimal trade-off between the value-to-go and the required information-to-go, unifying the ideas behind the Bellman and the Blahut–Arimoto iterations. This trade-off between value-to-go and information-to-go provides a complete analogy with the compression–distortion trade-off in source coding. The present new formulation connects seemingly unrelated optimization problems. The algorithm is demonstrated on grid world examples.

Review

Peter Dayan pointed me to this paper (which is actually a book chapter) when I told him that I find the continuous interaction between perception and action important and that Friston’s free energy framework is one of the few which covers this case. Now, this paper covers only discrete time (and states and actions), but certainly it addresses the issue that perception and action influence each other.

The main idea of the paper is to take the informational effort (they call it information-to-go) into account when finding a policy for a Markov decision process. A central finding is a recursive equation analogous to the (Bellman) equation for the Q-function in reinforcement learning which captures the expected (over all possible future state-action trajectories) informational effort of a certain state-action pair. Informational effort is defined as the KL-divergence between a factorising prior distribution over future states and actions (making them independent across time) and their true distribution. This means that the informational effort is the expected number of bits of information that you have to consider in addition to your prior when moving through the future. They then propose a free energy (also a recursive equation) which combines the informational effort with the Q-function of the underlying MDP and thus allows simultaneous optimisation of informational effort and reward where the two are traded off against each other.

Practically, this leads to “soft vs. sharp policies”: sharp policies which always choose the action with highest expected reward and soft policies which choose actions probabilistically with an associated penalty on reward compared to sharp policies. The softness of the resulting policy is controlled by the tradeoff parameter between informational effort and reward which can be interpreted as the informational capacity of the system under consideration. I understand it this way: the tradeoff parameter stands for the informational complexity/capacity of the distributions representing the internal model of the world in the agent and the optimal policy with a particular setting of tradeoff parameter is the optimal policy with respect to reward alone that a corresponding agent can achieve. This is easily seen when considering that informational effort depends on the prior for future state-action trajectories. For a given prior, tradeoff parameter and resulting policy you can find the corresponding more complex prior for which the same policy can be found for 0 informational effort. The prior here obviously corresponds to the internal model of the agent. Consequently, the authors present a general framework with which you can ask questions such as: “How much informational capacity does my agent need to solve a given task with a desired level of performance?” Or, in other words: “How complex does my agent need to be in order to solve the given task?” Or: “How well can my agent solve the given task?” Although this latter question is the standard question in RL. In particular, my intuition tells me that for every setting of the tradeoff parameter there probably is an equivalent POMDP formulation (which makes the corresponding difference between world and agent model explicit).

A particularly interesting discussion is that about “perfectly adapted environments” which seems to be directed towards Friston without mentioning him, though. The discussion results from the ability to optimise their free energy combined from informational effort and reward not only with respect to the policy, but also with respect to the (true) transition probabilities. The outcome of such an optimisation is an environment in which transition probabilities are directly related to rewards, or, in other words, an environment in which informational effort is equal to something like negative reward. In such an environment “minimizing the statistical surprise or maximizing the predictive information is equivalent to maximizing reward” which is what Friston argues (see also the associated discussion on hunch.net). Needless to say that they consider this as a very special case while in most other cases the environment contains information that is irrelevant in terms of reward. Nevertheless, they consider the possibility that the environments of living organisms are indeed perfectly or at least well adapted through millions of years of coevolution and they suggest to direct future research towards this issue. The question really is what is reward in this general sense? What is it that living organisms try to achieve? The more concrete reward is, for example, reward for a particular task, the less relevant most information in the environment will be. I’m tempted to say that the combined optimisation of informational effort and rewards, as presented here, will then lead to policies which particularly seak out relevant information, but I’m not sure whether this is a correct interpretation.

To sum up Tishby and Polani present a new theoretical framework which generalises reinforcement learning by incorporating ideas from information theory. They provide an interesting new perspective which is presented in a pleasingly accessible way. I do not think that they solved any particular problem in reinforcement learning, but they broadened the view by postulating that agents tradeoff informational effort (capacity?) and reward. Practically, computations derived from their framework may not be feasible in most cases, because original reinforcement learning is already hard and here a few expectations have been added. Or, maybe it’s not so bad, because you can do them together.

Flexible vowel recognition by the generation of dynamic coherence in oscillator neural networks: speaker-independent vowel recognition.

Liu, F., Yamaguchi, Y., and Shimizu, H.
Biol Cybern, 71:105–114, 1994
DOI, Google Scholar

Abstract

We propose a new model for speaker-independent vowel recognition which uses the flexibility of the dynamic linking that results from the synchronization of oscillating neural units. The system consists of an input layer and three neural layers, which are referred to as the A-, B- and C-centers. The input signals are a time series of linear prediction (LPC) spectrum envelopes of auditory signals. At each time-window within the series, the A-center receives input signals and extracts local peaks of the spectrum envelope, i.e., formants, and encodes them into local groups of independent oscillations. Speaker-independent vowel characteristics are embedded as a connection matrix in the B-center according to statistical data of Japanese vowels. The associative interaction in the B-center and reciprocal interaction between the A- and B-centers selectively activate a vowel as a global synchronized pattern over two centers. The C-center evaluates the synchronized activities among the three formant regions to give the selective output of the category among the five Japanese vowels. Thus, a flexible ability of dynamical linking among features is achieved over the three centers. The capability in the present system was investigated for speaker-independent recognition of Japanese vowels. The system demonstrated a remarkable ability for the recognition of vowels very similar to that of human listeners, including misleading vowels. In addition, it showed stable recognition for unsteady input signals and robustness against background noise. The optimum condition of the frequency of oscillation is discussed in comparison with stimulus-dependent synchronizations observed in neurophysiological experiments of the cortex.

Review

The authors present an oscillating recurrent neural network model for the recognition of Japanese vowels. The model consists of 4 layers: 1) an input layer which gives pre-processed frequency information, 2) an oscillatory hidden layer with local inhibition, 3) another oscillatory hidden layer with long-range inhibition and 4) a readout layer implementing the classification of vowels using a winner-takes-all mechanism. Layers 1-3 each contain 32 units where each unit is associated to one input frequency. The output layer contains one unit for each of the 5 vowels and the readout mechanism is based on multiplication of weighted sums of layer 3 activities such that the output is also oscillatory. The oscillatory units in layers 2 and 3 consist of an excitatory element coupled with an inhibitory element which oscillate, or become silent, depending on the input. The long-range connections in layer 3 are determined manually based on known correlations between formants (characteristic frequencies) of the individual vowels.

In experiments the authors show that the classification of their network is robust against different speakers (14 men, 5 women, 5 girls, 5 boys): 6 out of 145 trials were correctly classified. However, they do not report what exactly their criterion for classification performance was (remember that the output was oscillatory, also sometimes alternative vowels show bumps in the time course of a vowel in the shown examples). They also report robustness to imperfect stimuli (formants varying within a vowel) and noise (superposition of 12 different conversations), but only single examples are shown.

Without being able to tell what the state of the art in neural networks in 1994 was, I guess the main contribution of the paper is that it shows that vowel recognition may be robustly implemented using oscillatory networks. At least from today’s perspective the suggested network is a bad solution to the technical problem of vowel recogntion, but even alternative algorithms at the time were probably better in that (there’s a hint in one of the paragraphs in the discussion). The paper is a good example for what was wrong with neural network research at the time: the models give the feeling that they are pretty arbitrary. Are the units in the network only defined and connected like they are, because these were the parameters that worked? Most probably. At least here the connectivity is partly determined through some knowledge of how frequencies produced by vowels relate, but many other parameters appear to be chosen arbitrarily. Respect to the person who made it work. However, the results section is rather weak. They only tested one example of a spoken vowel per person and they don’t define classification performance clearly. I guess, you could argue that it is a proof-of-concept of a possible biological implementation, but then again it is still unclear how this can be properly related to real networks in the brain.