Probabilistic vs. non-probabilistic approaches to the neurobiology of perceptual decision-making.

Drugowitsch, J. and Pouget, A.
Curr Opin Neurobiol, 22:963–969, 2012
DOI, Google Scholar

Abstract

Optimal binary perceptual decision making requires accumulation of evidence in the form of a probability distribution that specifies the probability of the choices being correct given the evidence so far. Reward rates can then be maximized by stopping the accumulation when the confidence about either option reaches a threshold. Behavioral and neuronal evidence suggests that humans and animals follow such a probabilitistic decision strategy, although its neural implementation has yet to be fully characterized. Here we show that that diffusion decision models and attractor network models provide an approximation to the optimal strategy only under certain circumstances. In particular, neither model type is sufficiently flexible to encode the reliability of both the momentary and the accumulated evidence, which is a pre-requisite to accumulate evidence of time-varying reliability. Probabilistic population codes, by contrast, can encode these quantities and, as a consequence, have the potential to implement the optimal strategy accurately.

Review

It’s essentially an advertisement for probabilistic population codes (PPCs) for modelling perceptual decisions. In particular, they contrast PPCs to diffusion models and attractor models without going into details. The main argument against attractor models is that they don’t encode a decision confidence in the attractor state. The main argument against diffusion models is that they are not fit to represent varying evidence reliability, but it’s not fully clear to me what they mean by that. The closest I get is that “[…] the drift is a representation of the reliability of the momentary evidence” and they argue that for varying drift rate the diffusion model becomes suboptimal. Of course, if the diffusion model assumes a constant drift rate, it is suboptimal when the drift rate changes, but I’m not sure whether this is the point they are making. The authors mention one potential weak point of PPCs: They predict that the decision bound is defined on a linear combination of integrated momentary evidence, but the firing of neurons in area LIP indicates that the bound is on the estimated correctness of single decisions, i.e., there is a bound for each decision alternative, as in a race model. I interpret this as evidence for a decision model where the bound is defined on the posterior probability of the decision alternatives.

The paper is a bit sloppily written (frequent, easily avoidable language errors).

Representation of confidence associated with a decision by neurons in the parietal cortex.

Kiani, R. and Shadlen, M. N.
Science, 324:759–764, 2009
DOI, Google Scholar

Abstract

The degree of confidence in a decision provides a graded and probabilistic assessment of expected outcome. Although neural mechanisms of perceptual decisions have been studied extensively in primates, little is known about the mechanisms underlying choice certainty. We have shown that the same neurons that represent formation of a decision encode certainty about the decision. Rhesus monkeys made decisions about the direction of moving random dots, spanning a range of difficulties. They were rewarded for correct decisions. On some trials, after viewing the stimulus, the monkeys could opt out of the direction decision for a small but certain reward. Monkeys exercised this option in a manner that revealed their degree of certainty. Neurons in parietal cortex represented formation of the direction decision and the degree of certainty underlying the decision to opt out.

Review

The authors used a 2AFC-task with an option to waive the decision in favour of a choice which provides low, but certain reward (the sure option) to investigate the representation of confidence in LIP neurons. Behaviourally the sure option had the expected effect: it was increasingly chosen the harder the decisions were, i.e., the more likely a false response was. Trials in which the sure option was chosen, thus, may be interpreted as those in which the subject was little confident in the upcoming decision. It is important to note that task difficulty here was manipulated by providing limited amounts of information for a limited amount of time, i.e., this was not a reaction time task.

The firing rates of the recorded LIP neurons indicate that selection of the sure option is associated with an intermediate level of activity compared to that of subsequent choices of the actual decision options. For individual trials the authors found that firing rates closer to the mean firing rate (in a short time period before the sure option became available) more frequently lead to selection of the sure option than firing rates further away from the mean, but in absolute terms the activity in this time window could predict choice of the sure option only weakly (probability of 0.4). From these results the authors conclude that the LIP neurons which have previously been found to represent evidence accumulation also encode confidence in a decision. They suggest a simple drift-diffusion model with fixed diffusion parameter to explain the results. Additional to standard diffusion models they define confidence in terms of the log-posterior odds which they compute from the state of the drift-diffusion model. They define posterior as p(S_i|v), the probability that decision option i is correct given that the drift-diffusion state (the decision variable) is v. They compute it from the corresponding likelihood p(v|S_i), but don’t state how they obtained that likelihood. Anyway, the sure option is chosen in the model, when the log-posterior odds is below a certain level. I don’t see why the detour via the log-posterior odds is necessary. You could directly define v as the posterior for decision option i and still be consistent with all the findings in the paper. Of course, then v could not be governed by a linear drift anymore, but why should it in the first place? The authors keenly promote the Bayesian brain, but stop just before the finishing line. Why?

Robust averaging during perceptual judgment.

de Gardelle, V. and Summerfield, C.
Proc Natl Acad Sci U S A, 108:13341–13346, 2011
DOI, Google Scholar

Abstract

An optimal agent will base judgments on the strength and reliability of decision-relevant evidence. However, previous investigations of the computational mechanisms of perceptual judgments have focused on integration of the evidence mean (i.e., strength), and overlooked the contribution of evidence variance (i.e., reliability). Here, using a multielement averaging task, we show that human observers process heterogeneous decision-relevant evidence more slowly and less accurately, even when signal strength, signal-to-noise ratio, category uncertainty, and low-level perceptual variability are controlled for. Moreover, observers tend to exclude or downweight extreme samples of perceptual evidence, as a statistician might exclude an outlying data point. These phenomena are captured by a probabilistic optimal model in which observers integrate the log odds of each choice option. Robust averaging may have evolved to mitigate the influence of untrustworthy evidence in perceptual judgments.

Review

The authors investigate what influence the variance of evidence has on perceptual decisions. A bit counterintuitively, they implement varying evidence by simultaneously presenting elements with different feature values (e.g. color) to subjects instead of presenting only one element which changes its feature value over time (would be my naive approach). Perhaps they did this to be able to assume constant evidence over time such that the standard drift diffusion model applies. My intuition is that subjects anyway implement a more sequential sampling of the stimulus display by varying attention to individual elements.

The behavioural results show that subjects take both mean presented evidence as well as the variance of evidence into account when making a decision: For larger mean evidence and smaller variance of evidence subjects are faster and make less mistakes. The results are attention dependent: mean and variance in a task-irrelevant feature dimension had no effect on responses.

The behavioural results can be explained by a drift diffusion model with a drift rate which takes the variance of the evidence into account. The authors present two such drift rates. 1) SNR drift = mean / standard deviation (as computed from trial-specific feature values). 2) LPR drift = mean log posterior ratio (also computed from trial-specific feature values). The two cannot be differentiated based on the measured mean RTs and error rates in the different conditions. So the authors provide an additional analysis which estimates the influence of the different presented elements, that is, the influence of the different feature values presented by them, on the given responses. This is done via a generalised linear regression by fitting a model which predicts response probabilites from presented feature values for individual trials. The fitted linear weights suggest that extreme (outlying) feature values have little influence on the final responses compared to the influence that (inlying) feature values close to the categorisation boundary have. Only the LPR model (2) replicates this effect.

Why have inlying feature values greater influence on responses than outlying ones in the LPR model, but not in the other models? The LPR model alone would not predict this, because for more extreme posterior values you get more extreme LPR values which then have a greater influence on the mean LPR value, i.e., the drift rate. Therefore, It is not entirely clear to me yet why they find a greater importance of inlying feature values in the generalised linear regression from feature values to responses. The best explanation I currently have is the influence of the estimated posterior values: Fig. S5 shows that the posterior values are constant for sufficiently outlying feature values and only change for inlying feature values, where the greatest change is at the feature value defining the categorisation boundary. When mapped through the LPR the posterior values lead to LPR values following the same sigmoidal form setting low and high feature values to constants. These constant high and low values may cancel each other out when, on average, they are equally many. Then, only the inlying feature values may have a lasting contribution on the LPR mean; especially those close to the categorisation boundary, because they tend to lead to larger variation in LPR values which may tip the LPR mean (drift rate) towards one of the two responses. This explanation means that the results depend on the estimated posterior values. In particular, that these are set to values of about 0.2, or 0.8, respectively, for a large range of extreme feature values.

I am unsure what conclusions can be drawn from the results. Although, the basic behavioural results are clear, it is not surprising that the responses of subjects depend on the variance of the presented evidence. You can define the feature values varying around the mean as noise. More variance then just means more noise and it is a basic result that people become slower and more error prone when presented with more noise. Perhaps surprisingly, it is here shown that this also works when noisy features are presented simultaneously on the screen instead of sequentially over time.

The DDM analysis shows that the drift rate of subjects decreases with increasing variance of evidence. This makes sense and means that subjects become more cautious in their judgements when confronted with larger variance (more noise). But I find the LPR model rather strange. It’s like pressing a Bayesian model into a mechanistic corset. The posterior ratio is an ad-hoc construct. Ok, it’s equivalent to the log-likelihood ratio, but why making it to a posterior ratio then? The vagueness arises already because of how the task is defined: all information is presented at once, but you want to describe accumulation of evidence over time. Consequently, you have to define some approximate, ad-hoc construct (mean LPR) which you can use to define the temporal integration. That the model based on that construct replicates an aspect of the behavioural data may be an artefact of the particular approximation used (apparently it is important that the estimated posterior values are constant for extreme feature values). So, it remains unclear to me whether an LPR-DDM is a good explanation for the involved processes in this case.

Actually, a large part of the paper (cf. title) concerns the finding that extreme feature values appear to have smaller influence on subject responses than feature values close to the categorisation boundary. This is surprising to me. Although it makes intuitive sense in terms of ‘robust averaging’, I wouldn’t predict it for optimal probabilistic integration of evidence, at least not without making further assumptions. Such assumptions are also implicit in the LPR-DDM and I’m a bit skeptical about it anyway. Thus, a good explanation is still needed, in my opinion. Finally, I wonder how reliable the generalised linear regression analysis, which led to these results, is. On the one hand, the authors report using two different generalised linear models and obtaining equivalent results. On the other hand, they estimate 9 parameters from only one binary response variable and I wonder how the optimisation landscape looks in this case.

A supramodal accumulation-to-bound signal that determines perceptual decisions in humans.

O’Connell, R. G., Dockree, P. M., and Kelly, S. P.
Nat Neurosci, 15:1729–1735, 2012
DOI, Google Scholar

Abstract

In theoretical accounts of perceptual decision-making, a decision variable integrates noisy sensory evidence and determines action through a boundary-crossing criterion. Signals bearing these very properties have been characterized in single neurons in monkeys, but have yet to be directly identified in humans. Using a gradual target detection task, we isolated a freely evolving decision variable signal in human subjects that exhibited every aspect of the dynamics observed in its single-neuron counterparts. This signal could be continuously tracked in parallel with fully dissociable sensory encoding and motor preparation signals, and could be systematically perturbed mid-flight during decision formation. Furthermore, we found that the signal was completely domain general: it exhibited the same decision-predictive dynamics regardless of sensory modality and stimulus features and tracked cumulative evidence even in the absence of overt action. These findings provide a uniquely clear view on the neural determinants of simple perceptual decisions in humans.

Review

The authors report EEG signals which may represent 1) instantaneous evidence and 2) accumulated evidence (decision variable) during perceptual decision making. The result promises a big leap for experiments in perceptual decision making with humans, because it is the first time that we can directly observe the decision process as it accumulates evidence with reasonable temporal resolution without sticking needles in participant’s brains. Furthermore, one of the found signals appears to be sensory and response modality independent, i.e., it appears to reflect the decision process alone – something that has not been clearly found in species other than humans, but let’s discuss the study in more detail.

The current belief about the perceptual decision making process is formalised in accumulation to bound models: When presented with a stimulus, the decision maker determines at each time point of the presentation the current amount of evidence for all possible alternatives. This estimate of “instantaneous evidence” is noisy, because of either the noise within the stimulus itself, or because of internal processing noise. Therefore, the decision maker does not immediately make a decision between alternatives, but accumulates evidence over time until the accumulated evidence for one of the alternatives reaches a threshold which is internally set by the decision maker itself and indicates a certain level of certainty, or response urgency. The alternative, for which the threshold was crossed, is the decision outcome and the time the threshold was crossed is the decision time (potentially including an additional delay). The authors argue that they have found signals in the EEG of humans which can be associated with the instantaneous and accumulated evidence variables of these kinds of models.

The paradigm used in this study was different from the perceptual decision making paradigm popular in monkeys (random dot stimuli). Here the authors used stimuli which did not move, but rather gradually changed their intensity or contrast: In the experiments with visual stimuli, participants were continuously viewing a flickering disk which from time to time gradually changed its contrast with the background (the contrast gradually went back to base level after 1.6s). So the participants had to decide whether they observe a contrast different from baseline at the current time. Note that this setup is slightly different from usual trial-based perceptual decision making experiments where a formally new trial begins after a participant’s response. The disk also had a pattern, but it’s unclear why the pattern was necessary. On the other hand, using the other stimulus properties seems reasonable: The flickering induced something like continuous evoked potentials in the EEG ensuring that something stimulus-related could be measured at all times, but the gradual change of contrast “successfully eliminated sensory-evoked deflections from the ERP trace” such that the more subtle accumulated evidence signals were not masked by large deflections solely due to stimulus onsets. In the experiments with sounds, equivalent stimulus changes were implemented by either gradually changing the volume of a presented, envelope-modulated tone or its frequency.

The authors report 4 EEG signals related to perceptual decision making. They argue that the occipital steady-state visual-evoked potential (SSVEP) indicated the estimated instantaneous evidence when visual stimuli were used, because its trajectories directly reflected the changes in constrast. For auditory stimuli, the authors found a corresponding steady-state auditory-evoked potential (SSAEP) which was located at more central EEG electrodes and at 40Hz instead of 20Hz (SSVEP). Further, the authors argue that a left-hemisphere beta (LHB, 22-30Hz) and a centro-parietal potential (CPP, direct electrode measurements) could be interpreted as evidence accumulation signals, because the time of their peaks tightly predicted reaction times and their time courses were better predicted by the cumulative SSVEP instead of the original SSVEP. LHB and CPP also (roughly) showed the expected dependency on whether the participant correctly identified the target, or missed it (lower signals for misses). Furthermore, they reacted expectedly, when contrast varied in more complex ways than just a linear decrease (decrease followed by short increase followed by decrease). CPP was different from LHB by also showing the expected changes when the task did not require an overt response at target detection time whereas LHB showed no relation to the present evidence in this task indicating that it may have something to do with motor preparation of the response while CPP is a more abstract decision signal. Additionally, the CPP showed the characteristic changes measured with visual stimuli also with auditory stimuli and it depended on attentional focus: In one experimental condition the task of the participants was altered (‘detect a transient size change of a central fixation square’), but the original disk stimulus was still presented including the gradual contrast changes. In this ‘non-attend’ condition the SSVEP decreased with contrast as before, but the CPP showed no response reinforcing the idea that the CPP is an abstract decision signal. On a final note, the authors speculate that the CPP could be equal to the standard P300 signal, when transient stimuli need to be detected instead of gradual stimulus changes. This connection, if true, would be a nice functional explanation of the P300.

Open Questions

Despite the generally intriguing results presented in the paper a few questions remain. These predominantly regard details.

1) omission of data

In Figs. 2 and 3 the SSVEP is not shown anymore, presumably because of space restrictions. Similarly, the LHB is not presented in Fig. 4. I can believe that the SSVEP behaved expectedly in the different conditions of Figs. 2 and 3 such that not much information would have been added by providing the plots, but it would at least be interesting to know whether the accumulated SSVEP still predicted the LHB and CCP better than the original SSVEP in these conditions. Likewise, the authors do not report the equivalent analysis for the SSAEP in the auditory conditions. Regarding the omission of the LHB in Fig. 4, I’m not so certain about the behaviour of the LHB in the auditory conditions. It seems possible that the LHB shows different behaviour with different modalities. There is no mention of this in the text, though.

2) Is there a common threshold level?

The authors argue that the LHB and CCP reached a common threshold level just before response initiation (a prediction of accumulation to bound models, Fig. 1c), but the used test does not entirely convince me: They compared the variance just before response initiation with the variance of measurements across different time points (they randomly assigned the RT of one trial to another trial and computed variance of measurements at the shuffled time points). For a strongly varying function of time, it is no surprise that the measurements at a consistent time point vary less than the measurements made across many different time points as long as the measurement noise is small enough. Based on this argument, it is strange that they did not find a significant difference for the SSVEP which also varies strongly across time (this fits into their interpretation, though), but this lack of difference could be explained by larger measurement noise associated with the SSVEP.

Furthermore, the authors report themselves that they found a significant difference between the size of CPP peaks around decision time for varying contrast levels (Fig. 2c). Especially, the CPP peak for false alarms (no contrast change, but participant response) was lower than the other peaks. If the CPP really is the decision variable predicted by the models, then these differences should not have occurred. So where do they come from? The authors provide arguments that I cannot follow without further explanations.

3) timing of peaks

It appears that the mean reaction time precedes the peaks of the mean signals slightly. The effect is particularly clear in Fig. 3b (CPP), Fig. 4d (CPP) and Fig. 5a, but is also slightly visible in the averages centred at the time of response in Figs. 1c and 2c. Presuming a delay from internal decision time to actual response, the time of the peak of the decision variable should precede the reaction time, especially when reaction time is measured from button presses (here) compared to saccade initiation (typical monkey experiments). So why does it here appear to be the other way round?

4) variance of SSVEP baseline

The SSVEP in Fig. 4a is in a different range (1.0-1.3) than the SSVEP in Fig. 4d (1.7-2.5) even though the two plots should each contain a time course for the same experimental condition. Where does the difference come from?

5) multiple alternatives

The CPP, as described by the authors, is a single, global signal of a decision variable. If the decision problem is composed of only two decision alternatives, a single decision variable is indeed sufficient for decision making, but if more alternatives are considered, several evidence accumulating variables are needed. What would the CPP then signal? One of the decision variables? The total amount of certainty of the upcoming decision?

Conclusion

I do like the results in the paper. If they hold up, the CPP may provide a high temporal resolution window into the decision processes of humans. As a result, it may allow us to investigate decision processes for more complex situations than those which animals can master, but maybe it’s only a signal for the simple, perceptual decisions investigated here. Based on the above open questions I also guess that the reported signals were noisier than the plots make us belief and the correspondence of the CPP with theoretical decision variables should be further examined.

Causal role of dorsolateral prefrontal cortex in human perceptual decision making.

Philiastides, M. G., Auksztulewicz, R., Heekeren, H. R., and Blankenburg, F.
Curr Biol, 21:980–983, 2011
DOI, Google Scholar

Abstract

The way that we interpret and interact with the world entails making decisions on the basis of available sensory evidence. Recent primate neurophysiology [1-6], human neuroimaging [7-13], and modeling experiments [14-19] have demonstrated that perceptual decisions are based on an integrative process in which sensory evidence accumulates over time until an internal decision bound is reached. Here we used repetitive transcranial magnetic stimulation (rTMS) to provide causal support for the role of the dorsolateral prefrontal cortex (DLPFC) in this integrative process. Specifically, we used a speeded perceptual categorization task designed to induce a time-dependent accumulation of sensory evidence through rapidly updating dynamic stimuli and found that disruption of the left DLPFC with low-frequency rTMS reduced accuracy and increased response times relative to a sham condition. Importantly, using the drift-diffusion model, we show that these behavioral effects correspond to a decrease in drift rate, a parameter describing the rate and thereby the efficiency of the sensory evidence integration in the decision process. These results provide causal evidence linking the DLPFC to the mechanism of evidence accumulation during perceptual decision making.

Review

They apply repetitive TMS to the dorsolateralprefrontal cortex (DLPFC) assuming that this inhibits the decision making ability of subjects, because DLPFC has been shown to be involved in perceptual decision making. Indeed, they find a significant effect of TMS vs. SHAM on the responses of subjects (after TMS responses of subjects are less accurate and take longer). They also argue that the effect is particular to TMS, because it reduces over time, but I wonder why they did not compute the corresponding interaction (they just report that the effect of TMS vs. SHAM is significant earlier, but not significant later).

Furthermore, they hypothesised that TMS disrupted the accumulation process of noisy evidence over time by decreasing the rate of evidence increase. This is based on the previous finding that the DLPFC has higher BOLD activation for less noisy stimuli which suggests that, when DLPFC is disrupted, the evidence coming from less noisy stimuli cannot be optimally processed anymore.

They investigated the evidence accumulation hypothesis by fitting a drift-diffusion model (DDM) to response data. The DDM has more parameters than are necessary to explain the variations of response data for the different experimental conditions. Hence, they use the Bayesian information criterion (BIC) to select parameters which should be fitted for each experimental condition separately, i.e., to be able to say which parameters are affected by the experimental manipulations. The other parameters are still fitted but to all data across experimental conditions. The problem is that the BIC is a very crude approximation just taking the number of freely varying parameters into account. For example, an assumption underlying the BIC is that the Hessian of the likelihood evaluated at the fitted parameter values has full rank (Bishop, 2006, p. 217), but for highly correlated parameters this may not be the case. The used DMAT fitting toolbox actually approximates the Hessian matrix, checks whether a local minimum has been found (instead of a valley) and computes confidence intervals from the approximated Hessian, but the authors report no results for this apart from error bars on the plot for drift rate and nondecision time.

Anyway, the BIC analysis conveniently indicates that drift rate and nondecision time best explain the variations in response data across conditions. However, it has to be kept in mind that these results have been obtained by (presumably) assuming that the diffusion is fixed across conditions which is the standard when fitting a DDM [private correspondence with Rafal Bogacz, 09/2012], because drift rate, diffusion and threshold are redundant (a change in one of them can be reverted by a suitable change in the others). The interpretation of the BIC analysis probably should be that drift rate and nondecision time are the smallest set of parameters which still allow a good fit of the data given that diffusion is fixed.

You need to be careful when interpreting the fitted parameter values in the different conditions. In particular, fitting a DDM to data assumes that the evidence accumulation still works like a DDM, just with different parameters. However, it is not clear what TMS does to the affected processes in the brain. Hence, we can only say from the fitting results that TMS has an effect which is equivalent to a reduction of the drift rate (no clear effect on nondecision time) in a normally functioning DDM.

Similarly, the interpretation of the results for nondecision time is not straightforward. There, the main finding is that nondecision time decreases for high-evidence stimuli which the authors interpret as a reduced time of low-level sensory processing which provides input to evidence accumulation. However, it should be kept in mind that the total amount of time necessary to make a decision is also reduced for high-evidence stimuli. Also, part of the processes which are collected under ‘nondecision time’ may actually work in parallel to evidence accumulation, e.g., movement preparation. If you look at the percentage of RT that is explained by the nondecision time, then the picture is reversed: for high-evidence stimuli nondecision time explains about 82% of RTs while for low-evidence stimuli it explains only about 75% which is consistent with the basic idea that evidence accumulation takes longer for noisier stimuli. In general, these percentages are surprisingly high. Does the evidence accumulation really only account for about 25% of total RTs? But it’s good that we have a number to compare now.

So what do these findings mean for the DLPFC? We cannot draw any definite conclusions. The hypothesis that TMS over DLPFC affects drift rate is somewhat built into the analysis, because the authors use a DDM to fit the responses. Of course, other parameters could have been affected stronger such that the finding of the BIC analysis that drift rate explains the changes best can indeed be taken as evidence for the drift rate hypothesis. However, it is not possible to exclude other explanations which lie outside the parameter space of the DDM. What, for example, if the DLPFC has indeed a somewhat attentional effect on evidence accumulation in the sense that it not only accumulates evidence, but also modulates how big the individual peaces of evidence are by modulating lower-level sensory processing? Then, interrupting the DLPFC may still have a similar effect as observed here, but the interpretation of the role of the DLPFC would be slightly different. Actually, the authors argue against a role of the DLPFC (at least the part of DLPFC they found) in attentional processing, but I’m not entirely convinced. Their main argument is based on the assumption that a top-down attentional effect of the DLPFC on low-level sensory processing would increase the nondecision time, but this is not necessarily true. A) there is the previously mentioned issue of parallel processing and the general problems of fitting a standard model to a disturbed process which makes me doubt the reliability of the fitted nondecision times and B) I can easily conceive a system in which attentional modulation would not delay low-level sensory processing.

Perceptions as hypotheses: saccades as experiments.

Friston, K., Adams, R. A., Perrinet, L., and Breakspear, M.
Front Psychol, 3:151, 2012
DOI, Google Scholar

Abstract

If perception corresponds to hypothesis testing (Gregory, 1980); then visual searches might be construed as experiments that generate sensory data. In this work, we explore the idea that saccadic eye movements are optimal experiments, in which data are gathered to test hypotheses or beliefs about how those data are caused. This provides a plausible model of visual search that can be motivated from the basic principles of self-organized behavior: namely, the imperative to minimize the entropy of hidden states of the world and their sensory consequences. This imperative is met if agents sample hidden states of the world efficiently. This efficient sampling of salient information can be derived in a fairly straightforward way, using approximate Bayesian inference and variational free-energy minimization. Simulations of the resulting active inference scheme reproduce sequential eye movements that are reminiscent of empirically observed saccades and provide some counterintuitive insights into the way that sensory evidence is accumulated or assimilated into beliefs about the world.

Review

In this paper Friston et al. introduce the notion that an agent (such as the brain) minimizes uncertainty about its state in the world by actively sampling those states which minimise the uncertainty of the agent’s posterior beliefs, when visited some time in the future. The presented ideas can also be seen as reply to the commonly formulated dark-room-critique of Friston’s free energy principle which states that under the free energy principle an agent would try to find a dark, stimulus-free room in which sensory input can be perfectly predicted. Here, I review these ideas together with the technical background (see also a related post about Friston et al., 2011). Although I find the presented theoretical argument very interesting and sound (and compatible with other proposals for the origin of autonomous behaviour), I do not think that the presented simulations conclusively show that the extended free energy principle as instantiated by the particular model chosen in the paper leads to the desired exploratory behaviour.

Introduction: free energy principle and the dark room

Friston’s free energy principle has gained considerable momentum in the field of cognitive neuroscience as a unifying framework under which many cognitive phenomena may be understood. Its main axiom is that an agent tries to minimise the long-term uncertainty about its state in the world by executing actions which make prediction of changes in the agent’s world more precise, i.e., which minimise surprises. In other words, the agent tries to maintain a sort of homeostasis with its environment.

While homeostasis is a concept which most people happily associate with bodily functions, it is harder to reconcile with cognitive functions which produce behaviour. Typically, the counter-argument for the free energy principle is the dark-room-problem: changes in a dark room can be perfectly predicted (= no changes), so shouldn’t we all just try to lock ourselves into dark rooms instead of frequently exploring our environment for new things?

The shortcoming of the dark-room-problem is that an agent cannot maintain homeostasis in a dark room, because, for example, its bodily functions will stop working properly after some time without water. There may be many more environmental factors which may disturb the agent’s dark room pleasure. An experienced agent knows this and has developed a corresponding model about its world which tells it that the state of its world becomes increasingly uncertain as long as the agent only samples a small fraction of the state space of the world, as it is the case when you are in a dark room and don’t notice what happens outside of the room.

The present paper formalises this idea. It assumes that an agent only observes a small part of the world in its local surroundings, but also maintains a more comprehensive model of its world. To decrease uncertainty about the global state of the world, the agent then explores other parts of the state space which it beliefs to be informative according to its current estimate of the global world state. In the remainder I will present the technical argument in more detail, discuss the supporting experiments and conclude with my opinion about the presented approach.

Review of theoretical argument

In previous publications Friston postulated that agents try to minimise the entropy of the world states which they encounter in their life and that this minimisation is equivalent to minimising the entropy of their sensory observations (by essentially assuming that the state-observation mapping is linear). The sensory entropy can be estimated by the average of sensory surprise (negative model evidence) across (a very long) time. So the argument goes that an agent should minimise sensory surprise at all times. Because sensory surprise cannot usually be computed directly, Friston suggests a variational approximation in which the posterior distribution over world states (posterior beliefs) and model parameters is separated. Further, the posterior distributions are approximated with Gaussian distributions (Laplace approximation). Then, minimisation of surprise is approximated by minimisation of Friston’s free energy. This minimisation is done with respect to the posterior over world states and with respect to action. The former corresponds to perception and ensures that the agent maintains a good estimate of the state of the world and the latter implements how the agent manipulates its environment, i.e., produces behaviour. While the former is a particular instantiation of the Bayesian brain hypothesis, and hence not necessarily a new idea, the latter had not previously been proposed and subsequently spurred some controversy (cf. above).

At this point it is important to note that the action variables are defined on the level of primitive reflex arcs, i.e., they directly control muscles in response to unexpected basic sensations. Yet, the agent can produce arbitrary complex actions by suitably setting sensory expectations which can be done via priors in the model of the agent. In comparison with reinforcement learning, the priors of the agent about states of the world (the probability mass attributed by the prior to the states), therefore, replace values or costs. But how does the agent choose its priors? This is the main question addressed by the present paper, however, only in the context of a freely exploring (i.e., task-free) agent.

In this paper, Friston et al. postulate that an agent minimises the joint entropy of world states and sensory observations instead of only the entropy of world states. Because the joint entropy is the sum of sensory entropy and conditional entropy (world states conditioned on sensory observations), the agent needs to implement two minimisations. The minimisation of sensory entropy is exactly the same as before implementing perception and action. However, conditional entropy is minimised with respect to the priors of the agent’s model, implementing higher-level action selection.

In Friston’s dynamic free energy framework (and other filters) priors correspond to predictive distributions, i.e., distributions over the world states some time in the future given their current estimate. Friston also assumes that the prior densities are Gaussian. Hence, priors are parameterised by their mean and covariance. To manipulate the probability mass attributed by the prior to the states he, thus, has to change prior mean or covariance of the world states. In the present paper the authors use a fixed covariance (as far as I can tell) and implement changes in the prior by manipulating its mean. They do this indicrectly by introducing new, independent control variables (“controls” from here on) which parameterise the dynamics of the world states without having a dynamics associated with themselves. The controls are treated like the other hidden variables in the agent model and their values are inferred from the sensory observations via free energy minimisation. However, I guess, that the idea is to more or less fix the controls to their prior means, because the second entropy minimisation, i.e., minimisation of the conditional entropy, is with respect to these prior means. Note that the controls are pretty arbitrary and can only be interpreted once a particular model is considered (as is the case for the remaining variables mentioned so far).

As with the sensory entropy, the agent has no direct access to the conditional entropy. However, it can use the posterior over world states given by the variational approximation to approximate the conditional entropy, too. In particular, Friston et al. suggest to approximate the conditional entropy using a predictive density which looks ahead in time from the current posterior and which they call counterfactual density. The entropy of this counterfactual density tells the agent how much uncertainty about the global state of the world it can expect in the future based on its current estimate of the world state. The authors do not specify how far in the future the counterfactual density looks. They here use the denotational trick to call negative conditional entropy ‘saliency’ to make the correspondence between the suggested framework and experimental variables in their example more intuitive, i.e., minimisation of conditional entropy becomes maximisation of saliency. The actual implementation of this nonlinear optimisation is computationally demanding. In particular, it will be very hard to find global optima using gradient-based approaches. In this paper Friston et al. bypass this problem by discretising the space spanned by the controls (which are the variables with respect to which they optimise), computing conditional entropy at each discrete location and simply selecting the location with minimal entropy, i.e., they do grid search.

In summary, the present paper extends previous versions of Friston’s free energy principle by adding prior selection, or, say, high-level action, to perception and action. This is done by adding new control variables representing high-level actions and setting these variables using a new optimisation which minimises future uncertainty about the state of the world. The descriptions in the paper implicitly suggest that the three processes happen sequentially: first the agent perceives to get the best estimate of the current world state, then it produces action to take the world state closer to its expectations and then it reevaluates expectations and thus sets high-level actions (goals). However, Friston’s formulations are in continuous time such that all these processes supposedly happen in parallel. For perception and action alone this leads to unexpected interactions. (Do you rather perceive the true state of the world as it is, or change it such that it corresponds to your expectations?) Adding control variables certainly doesn’t reduce this problem, if their values are inferred (perceived), too, but if perception cannot change them, only action can reduce the part of free energy contributed by them, thereby disentangling perception and action again. Therefore, the new control variables may be a necessary extension, if used properly. To me, it does not seem plausible that high-level actions are reevaluated continuously. Shouldn’t you wait until, e.g., a goal is reached? Such a mechanism is still missing in the present proposal. Instead the authors simply reevaluate high-level actions (minimise conditional entropy with respect to control variable priors) at fixed, ad-hoc intervals spanning sufficiently large amounts of time.

Review of presented experiments (saccade model)

To illustrate the theoretical points, Friston et al. present a model for saccadic eye movements. This model is very basic and is only supposed to show in principle that the new minimisation of conditional entropy can provide sensible high-level action. The model consists of two main parts: 1) the world, which defines how sensory input changes based on the true underlying state of the world and 2) the agent, which defines how the agent believes the world behaves. In this case, the state of the world is the position in a viewed image which is currently fixated by the eye of the agent. This position, hence, determines what input the visual sensors of the agent currently get (the field of view around the fixation position is restricted), but additionally there are proprioceptive sensors which give direct feedback about the position. Action changes the fixation position. The agent has a similar, but extended model of the world. In it, the fixation position depends on the hidden controls. Additionally, the model of the agent contains several images such that the agent has to infer what image it sees based on its sensory input.

In Friston’s framework, inference results heavily depend on the setting of prior uncertainties of the agent. Here, the agent is assumed to have certain proprioception, but uncertain vision such that it tends to update its beliefs of what it sees (which image) rather than trying to update its beliefs of where it looks. [I guess, this refers to the uncertainties of the hidden states and not the uncertainties of the actual sensory input which was probably chosen to be quite certain. The text does not differentiate between these and, unfortunately, the code was not yet available within the SPM toolbox at the time of writing (08.09.2012).]

As mentioned above, every 16 time steps the prior for the hidden controls of the agent is recomputed by minimising the conditional entropy of the hidden states given sensory input (minimising the uncertainty over future states given the sensory observations up to that time point). This is implemented by defining a grid of fixation positions and computing the entropy of the counterfactual density (uncertainty of future states) while setting the mean of the prior to one of the positions. In effect, this translates for the agent into: ‘Use your internal model of the world to simulate how your estimate of the world will change when you execute a particular high-level action. (What will be your beliefs about what image you see, when fixating a particular position?) Then choose the high-level action which reduces your uncertainty about the world most. (Which position gives you most information about what image you see?)’ Up to here, the theoretical ideas were self-contained and derived from first principles, but then Friston et al. introduce inhibition of return to make their results ‘more realistic’. In particular, they introduce an inhibition of return map which is a kind of fading memory of which positions were previously chosen as saccade targets and which is subtracted from the computed conditional entropy values. [The particular form of the inhibition of return computations, especially the initial substraction of the minimal conditional entropy value, is not motivated by the authors.]

For the presented experiments the authors use an agent model which contains three images as hypotheses of what the agent observes: a face and its 90° and 180° rotated versions. The first experiment is supposed to show that the agent can correctly infer which image it observes by making saccades to low conditional entropy (‘salient’) positions. The second experiment is supposed to show that, when an image is observed which is unknown to the agent, the agent cannot be certain of which of the three images it observes. The third experiment is supposed to show that the uncertainty of the agent increases when high entropy high-level actions are chosen instead of low entropy ones (when the agent chooses positions which contain very little information). I’ll discuss them in turn.

In the first experiment, the presented posterior beliefs of the agent about the identity of the observed image show that the agent indeed identifies the correct image and becomes more certain about it. Figure 5 of the paper also shows us the fixated positions and inhibition of return adapted conditional entropy maps. The presented ‘saccadic eye movements’ are misleading: the points only show the stabilised fixated positions and the lines only connect these without showing the large overshoots which occur according to the plot of ‘hidden (oculomotor) states’. Most critically, however, it appears that the agent already had identified the right image with relative certainty before any saccade was made (time until about 200ms). The results, therefore, do not clearly show that the saccade selection is beneficial for identifying the observed image, also because the presented example is only a single trial with a particular initial fixation point and with a noiseless observed image. Also, because the image was clearly identified very quickly, my guess is that the conditional entropy maps would be very similar after each saccade without inhibition of return, i.e., always the same fixation position would be chosen and no exploratory behaviour (saccades) would be seen, but this needs to be confirmed by running the experiment without inhibition of return. My overall impression of this experiment is that it presents a single, trivial example which does not allow me to draw general conclusions about the suggested theoretical framework.

The second experiment acts like a sanity check: the agent shouldn’t be able to identify one of its three images, when it observes a fourth one. Whether the experiment shows that, depends on the interpretation of the inferred hidden states. The way these states were defined their values can be directly interpreted as the probability of observing one of the three images. If only these are considered the agent appears to be very certain at times (it doesn’t help that the scale of the posterior belief plot in Figure 6 is 4 times larger than that of the same plot in Figure 5). However, the posterior uncertainty directly associated with the hidden states appears to be indeed considerably larger than in experiment 1, but, again, this is only a single example. Something that is rather strange: the sequence of fixation positions is almost exactly the same as in experiment 1 even though the observed image and the resulting posterior beliefs were completely different. Why?

Finally, experiment three is more like a thought experiment: what would happen, if an agent chooses high-level actions which maximise future uncertainty instead of minimising it. Well, the uncertainty of the agent’s posterior beliefs increases as shown in Figure 7, which is the expected behaviour. One thing that I wonder, though, and it applies to the presented results of all experiments: In Friston’s Bayesian filtering framework the uncertainty of the posterior hidden states is a direct function of their mean values. Hence, as long as the mean values do not change, the posterior uncertainty should stay constant, too. However, we see in Figure 7 that the posterior uncertainty increases even though the posterior means stay more or less constant. So there must be an additional (unexplained) mechanism at work, or we are not shown the distribution of posterior hidden states, but something slightly different. In both cases, it would be important to know what exactly resulted in the presented plots to be able to interpret the experiments in the correct way.

Conclusion

The paper presents an important theoretical extension to Friston’s free energy framework. This extension consists of adding a new layer of computations which can be interpreted as a mechanism for how an agent (autonomously) chooses its high-level actions. These high-level actions are defined in terms of desired future states encoded by the probability mass which is assigned to these states by the prior state distribution. Conceptually, these ideas translate into choosing maximally informative actions given the agent’s model of the world and its current state estimate. As discussed by Friston et al. such approaches to action selection are not new (see also Tishby and Polani, 2011). So, the author’s contribution is to show that these ideas are compatible with Friston’s free energy framework. Hence, on the abstract, theoretical level this paper makes sense. It also provides a sound theoretical argument for why an agent would not seek sensory deprivation in a dark room, as feared by critics of the free energy principle. However, the presented framework heavily relies on the agent’s model of the world and it leaves open how the agent has attained this model. Although the free energy principle also provides a way for the agent to learn parameters of its model, I still, for example, haven’t seen a convincing application in which the agent actually learnt the dynamics of an unknown process in the world. Probably Friston would here also refer to evolution as providing a good initialisation for process dynamics, but I find that a too cheap way out.

From a technical point of view the paper leaves a few questions open, for example: How far does the counterfactual distribution look into the future? What does it mean for high-level actions to change how far the agent looks into his subjective future? How well does the presented approach scale? Is it important to choose the global minimum of the conditional entropy (this would be bad, as it’s probably extremely hard to find in a general setting)? When, or how often, does the agent minimise conditional entropy to set high-level actions? What happens with more than one control variables (several possible high-level actions)? How can you model discrete high-level actions in Friston’s continuous Gaussian framework? How do results depend on the setting of prior covariances / uncertainties. And many more.

Finally, I have to say that I find the presented experiments quite poor. Although providing the agent with a limited field of view such that it has to explore different regions of a presented image is a suitable setting to test the proposed ideas, the trivial example and introduction of ad-hoc inhibition of return make it impossible to judge whether the underlying principle is successfully at work, or the simulations have been engineered to work in this particular case.

Representational switching by dynamical reorganization of attractor structure in a network model of the prefrontal cortex.

Katori, Y., Sakamoto, K., Saito, N., Tanji, J., Mushiake, H., and Aihara, K.
PLoS Comput Biol, 7:e1002266, 2011
DOI, Google Scholar

Abstract

The prefrontal cortex (PFC) plays a crucial role in flexible cognitive behavior by representing task relevant information with its working memory. The working memory with sustained neural activity is described as a neural dynamical system composed of multiple attractors, each attractor of which corresponds to an active state of a cell assembly, representing a fragment of information. Recent studies have revealed that the PFC not only represents multiple sets of information but also switches multiple representations and transforms a set of information to another set depending on a given task context. This representational switching between different sets of information is possibly generated endogenously by flexible network dynamics but details of underlying mechanisms are unclear. Here we propose a dynamically reorganizable attractor network model based on certain internal changes in synaptic connectivity, or short-term plasticity. We construct a network model based on a spiking neuron model with dynamical synapses, which can qualitatively reproduce experimentally demonstrated representational switching in the PFC when a monkey was performing a goal-oriented action-planning task. The model holds multiple sets of information that are required for action planning before and after representational switching by reconfiguration of functional cell assemblies. Furthermore, we analyzed population dynamics of this model with a mean field model and show that the changes in cell assemblies’ configuration correspond to those in attractor structure that can be viewed as a bifurcation process of the dynamical system. This dynamical reorganization of a neural network could be a key to uncovering the mechanism of flexible information processing in the PFC.

Review

Based on firing properties of certain prefrontal cortex neurons the authors suggest a network model in which short-term plasticity implements switches of what the neurons in the network represent. In particular, neurons in prefrontal cortex have been found which switch from representing goals to representing actions (first, their firing varies depending on which goal is shown, then it varies depending on which action is executed afterwards while firing equally for all goals). The authors call this representational switches and they assume that these are implemented via changes in the connection strengths of neurons in a recurrently connected neural network. The network is setup such that network activity always converges to one of several fixed point attractors. A suitable change in connection strengths then leads to a change in the attractor landscape which may be interpreted as a change in what the network represents. The main contribution of the authors is to suggest a particular pattern of short-term plasticity at synapses in the network such that the network exhibits the desired representational switching. Another important aspect of this model is its structure: the network consists of separate cell assemblies, different subsets of which are assumed to be active when either goals or actions are represented and the goal and action subsets are partially overlapping. For example, in their model they have four cell assemblies (A,B,C,D) and the subsets (A,B) and (C,D) are associated with goals while subsets (A,D) and (B,C) are associated with actions. Initially the network is assumed to be in the goal state in which the connection strenghts A-B and C-D are large. The presentation of one of two goals then makes the network activity converge to strong activation of (A,B) or (C,D). Synaptic depression of connections A-B (assuming that this is the active subset) with simultaneous facilitation of connections A-D and B-C then leads to the desired change of connection strengths which implements the representational switch and then makes either subset (A-D), or subset (B-C) the active subset. It is not entirely clear to me why only one action subset becomes active. Maybe this is what the inhibitory units in the model are for (their function is not explained by the authors). In further analysis and experiments the authors confirm the attractor landscape of the model (and how it changes), show that the timing of the representational switch can be influenced by input to the network and show that the probability of changing from a particular goal to a particular action can be manipulated by changing the number of prior connections between the corresponding cell assemblies.

The authors show a nice qualitative correspondence between experimental findings and simulated network behaviour (although some qualitative differences are left, too, e.g., a general increase of firing also for the non-preferred goal and action in the experimental findings). In essence, the authors present a mechanism which could implement the (seemingly) autonomous switching of representations in prefrontal cortex neurons. Whether this mechanism is used by the brain is an entirely different question. I don’t know of evidence backing the chosen special wiring of neurons and distribution of short-term placticity, but this might just reflect my lack of knowledge of the field. Additionally, I wouldn’t exclude the possibility of a hierarchical model. The authors argue against this by presuming that prefrontal cortex already should be the top of the hierarchy, but nothing prevents us to make hierarchical models of prefrontal cortex itself. This points to the mixing of levels of description in the paper: On the one hand, the main contributions of the paper are on the algorithmic level describing the necessary wiring in a network of a few units and how it needs to change to reproduce the behaviour observed in experiments. On the other hand, the main model is on an implementational level showing how these ideas could be implemented in a network of leaky integrate and fire (LIF) neurons. In my opinion, the LIF neuron network doesn’t add anything interesting to the paper apart from the proof that the algorithmic ideas can be implemented by such a network. On the contrary, it masks a bit the main points of the paper by introducing an abundance of additional parameters which needed to be chosen by the authors, but for which we don’t know which of these settings are important. Finally, I wonder how the described network is reset in order to be ready for the next trial. The problem is the following: the authors initialise the network such that the goal subsets have a high synaptic efficacy at the start of the trial. The short-term plasticity then reduces these synaptic efficacies while simultaneously increasing those of the action subsets. At the end of a trial they all end up in a similar range (see Fig. 3A bottom). In order for the network to work as expected in the next trial, it somehow needs to reset to the initial synaptic efficacies.

Probabilistic population codes for Bayesian decision making.

Beck, J. M., Ma, W. J., Kiani, R., Hanks, T., Churchland, A. K., Roitman, J., Shadlen, M. N., Latham, P. E., and Pouget, A.
Neuron, 60:1142–1152, 2008
DOI, Google Scholar

Abstract

When making a decision, one must first accumulate evidence, often over time, and then select the appropriate action. Here, we present a neural model of decision making that can perform both evidence accumulation and action selection optimally. More specifically, we show that, given a Poisson-like distribution of spike counts, biological neural networks can accumulate evidence without loss of information through linear integration of neural activity and can select the most likely action through attractor dynamics. This holds for arbitrary correlations, any tuning curves, continuous and discrete variables, and sensory evidence whose reliability varies over time. Our model predicts that the neurons in the lateral intraparietal cortex involved in evidence accumulation encode, on every trial, a probability distribution which predicts the animal’s performance. We present experimental evidence consistent with this prediction and discuss other predictions applicable to more general settings.

Review

In this article the authors apply probabilistic population coding as presented in Ma et al. (2006) to perceptual decision making. In particular, they suggest a hierarchical network with a MT and LIP layer in which the firing rates of MT neurons encode the current evidence for a stimulus while the firing rates of LIP neurons encode the evidence accumulated over time. Under the made assumptions it turns out that the accumulated evidence is independent of nuisance parameters of the stimuli (when they can be interpreted as contrasts) and that LIP neurons only need to sum (integrate) the activity of MT neurons in order to represent the correct posterior of the stimulus given the history of evidence. They also suggest a readout layer implementing a line attractor which reads out the maximum of the posterior under some conditions.

Details

Probabilistic population coding is based on the definition of the likelihood of stimulus features p(r|s,c) as an exponential family distribution of firing rates r. A crucial requirement for the central result of the paper (that LIP only needs to integrate the activity of MT) is that nuisance parameters c of the stimulus s do not occur in the exponential itself while the actual parameters of s only occur in the exponential. This restricts the exponential family distribution to the “Poisson-like family”, as they call it, which requires that the tuning curves of the neurons and their covariance are proportional to the nuisance parameters c (details for this need to be read up in Ma et al., 2006). The point is that this is the case when c corresponds to contrast, or gain, of the stimulus. For the considered random dot stimuli the coherence of the dots may indeed be interpreted as the contrast of the motion in the sense that I can imagine that the tuning curves of the MT neurons are multiplicatively related to the coherence of the dots.

The probabilistic model of the network activities is setup such that the firing of neurons in the network is an indirect, noisy observation of the underlying stimulus, but what we are really interested in is the posterior of the stimulus. So the question is how you can estimate this posterior from the network firing rates. The trick is that under the Poisson-like distribution the likelihood and posterior share the same exponential such that the posterior becomes proportional to this exponential, because the other parts of the likelihood do not depend on the stimulus s (they assume a flat prior of s such that you don’t need to consider it when computing the posterior). Thus, the probability of firing in the network is determined from the likelihood while the resulting firing rates simultaneously encode the posterior. Mind-boggling. The main contribution from the authors then is to show, assuming that firing rates of MT neurons are driven from the stimulus via the corresponding Poisson-like likelihood, that LIP neurons only need to integrate the spikes of MT neurons in order to correctly represent the posterior of the stimulus given all previous evidence (firing of MT neurons). Notice, that they also assume that LIP neurons have the same tuning curves with respect to the stimulus as MT neurons and that the neurons in LIP sum the activity of this MT neuron which they share a tuning curve with. They note that a naive procedure like that, i.e. a single neuron integrating MT firing over time, would quickly saturate its activity. So they show, and that is really cool, that global inhibition in the LIP network does not affect the representation of the posterior, allowing them to prevent saturation of firing while maintaining the probabilistic interpretation.

So far to the theory. In practice, i.e. experiments, the authors do something entirely different, because “these results are important, but they are based on assumptions that are not necessarily exactly true in vivo. […] It is therefore essential that we test our theory in biologically realistic networks.” Now, this is a noble aim, but what exactly do we learn about this theory, if all results are obtained using methods which violate the assumptions of the theory? For example, neither the probability of firing in MT nor LIP is Poisson-like, LIP neurons not just integrate MT activity, but are also recurrently connected, LIP neurons have local inhibition (they are leaky integrators, inhibition between LIP neurons depending on tuning properties) instead of global inhibition and LIP neurons have an otherwise completely unmotivated “urgency signal” whose contribution increases with time (this stems from experimental observations). Without any concrete links between the two models in theory (I guess, the main ideas are similar, but the details are very different) it has to be shown that they are similar using experimental results. In any case, it is hard to differentiate between contributions from the probabilistic theory and the network implementation, i.e., how much of the fit between experimental findings in monkeys and the behaviour of the model is due to the chosen implementation and how much is due to the probabilistic interpretation?

Results

The overall aim of the experiments / simulations in the paper is to show that the proposed probabilistic interpretation is compatible with the experimental findings in monkey LIP. The hypothesis is that LIP neurons encode the posterior of the stimulus as suggested in the theory. This hypothesis is false from the start, because some assumptions of the theory apparently don’t apply to neurons (as acknowledged by the authors). So the new hypothesis is that LIP neurons approximately encode some posterior of the stimulus. The requirement for this posterior is that updates of the posterior should take the uncertainty of the evidence and the uncertainty of the previous estimate of the posterior into account which the authors measure as a linear increase of the log odds of making a correct choice, log[ p(correct) / (1-p(correct)) ], with time together with the dependence of the slope of this linear increase on the coherence (contrast) of the stimulus. I did not follow up why the previous requirement is related to the log odds in this way, but it sounds ok. Remains the question how to estimate the log odds from simulated and real neurons. For the simulated neurons the authors approximate the likelihood with a Poisson-like distribution whose kernel (parameters) were estimated from the simulated firing rates. They argue that it is a good approximation, because linear estimates of the Fisher information appear to be sufficient (I can’t comment on the validity of this argument). A similar approximation of the posterior cannot be done for real LIP neurons, because of a lack of multi-unit recordings which estimate the response of the whole LIP population. Instead, the authors approximate the log odds from measured firing rates of neurons tuned to motion in direction 0 and 180 degrees via a linear regression approach described in the supplemental data.

The authors show that the log-odds computed from the simulated network exhibit the desired properties, i.e., the log-odds linearly increase with time (although there’s a kink at 50ms which supposedly is due to the discretisation) and depend on the coherence of the motion such that the slope of the log-odds increases also when coherence is increased within a trial. The corresponding log-odds of real LIP neurons are far noisier and, thus, do not allow to make definite judgements about linearity. Also, we don’t know whether their slopes would actually change after a change in motion coherence during a trial, as this was never tested (it’s likely, though).

In order to test whether the proposed line attractor network is sufficient to read out the maximum of the posterior in all conditions (readout time and motion coherence) the authors compare a single (global) readout with local readouts adapted for a particular pair of readout time and motion coherence. However, the authors don’t actually use attractor networks in these experiments, but note that these are equivalent to local linear estimators and so use these. Instead of comparing the readouts from these estimators with the actual maximum of the posterior, they only compare the variance of the estimators (Fisher information) which they show to be roughly the same for the local and global estimators. From this they conclude that a single, global attractor network could read out the maximum of the (approximated) posterior. However, this is only true, if there’s no additional bias of the global estimator which we cannot see from these results.

In an additional analysis the authors show that the model qualitatively replicates the behavioural variables (probability correct and reaction time). However, these are determined from the LIP activities in a surprisingly ad-hoc way: the decision time is determined as the time when any one of the simulated LIP neurons reaches a threshold defined on the probability of firing and the decision is determined as the preferred direction of the neuron hitting the threshold (for 2 and 4 choice tasks the response is determined as the quadrant of the motion direction in which the preferred direction of the neuron falls). Why do the authors not use the attractor network to readout the response here? Also, the authors use a lower threshold for the 4-choice task than for the 2-choice task. This is strange, because one of the main findings of the Churchland et al. (2008) paper was that the decision in both, 2- and 4-choice tasks, appears to be determined by a common decision threshold while the initial firing rates of LIP neurons were lower for 4-choice tasks. Here, they also initialise with lower firing rates in the 4-choice task, but additionally choose a lower threshold. They don’t motivate this. Maybe it was necessary to fit the data from Churchland et al. (2008). This discrepancy between data and model is even more striking as the authors of the two papers partially overlap. So, do they deem the corresponding findings of Churchland et al. (2008) not important enough to be modelled, is it impossible to be modelled within their framework, or did they simply forget?

Finally, also the build-up rates of LIP neurons seem to be qualitatively similar in the simulation and the data, although they are consistently lower in the model. The build-up rates for the model are estimated from the first 50ms within each trial. However, the log-odds ratio had this kink at 50ms after which its slope was larger. So, if this effect is also seen directly in the firing rates, the fit of the build-up rates to the data may even be better, if probability of firing after 50ms is used. In Fig. 2C no such kink can be seen in the firing rates, but this is only data for 2 neurons in the model.

Conclusion

Overall the paper is very interesting and stimulating. It is well written and full of sound theoretical results which originate from previous work of the authors. Unfortunately, biological nature does not completely fit the beautiful theory. Consequently, the authors run experiments with more plausible neural networks which only approximately implement the theory. So what conclusions can we draw from the presented results? As long as the firing of MT neurons reflects the likelihood of a stimulus (their MT network is setup in this way), probably a wide range of networks which accumulate this firing will show responses similar to real LIP neurons. It is not easy to say whether this is a consequence of the theory, which states that MT firing rates should be simply summed over time in order to get the right posterior, because of the violation of the assumptions of the theory in more realistic networks. It could also be that more complicated forms of accumulation are necessary such that LIP firing represents the correct posterior. Simple summing then just represents a simple approximation. Also, I don’t believe that the presented results can rule out the possibility of sampling based coding of probabilities (see Fiser et al., 2010) for decision making as long as also the sampling approach would implement some kind of accumulation procedure (think of particle filters – the implementation in a recurrent neural network would probably be quite similar).

Nevertheless, the main point of the paper is that the activity in LIP represents the full posterior and not only MAP estimates or log-odds. Consequently, the model very easily extends to the case of continuous directions of motion which is in contrast to previous, e.g., attractor-based, neural models. I like this idea. However, I cannot determine from the experiments whether their network actually implements the correct posterior, because all their tests yield only indirect measures based on approximated analyses. Even so, it is pretty much impossible to verify that the firing of LIP neurons fits to the simulated results as long as we cannot measure firing of a large part of the corresponding neural population in LIP.

The Neural Costs of Optimal Control.

Gershman, S. J. and Wilson, R. C.
in: Advances in Neural Information Processing Systems 23, 2010
Google Scholar

Abstract

Optimal control entails combining probabilities and utilities. However, for most practical problems, probability densities can be represented only approximately. Choosing an approximation requires balancing the benefits of an accurate approximation against the costs of computing it. We propose a variational framework for achieving this balance and apply it to the problem of how a neural population code should optimally represent a distribution under resource constraints. The essence of our analysis is the conjecture that population codes are organized to maximize a lower bound on the log expected utility. This theory can account for a plethora of experimental data, including the reward-modulation of sensory receptive fields, GABAergic effects on saccadic movements, and risk aversion in decisions under uncertainty.

Review

Within the area of decision theory they consider the problem of evaluating the expected utility of an action given a posterior. They propose a variational framework analogously to the one used for the EM algorithm in which the utility replaces the likelihood and the posterior replaces the prior. Their main contribution is to include a cost penalising the complexity of the approximation of the posterior and use the so defined lower bound on the expected utility to simultaneously optimise the density used to approximate the posterior. As the utility does not only contain the cost of the approximation, but also the actual utility of an action in the considered states, this model predicts that the approximating density should also reflect what is behaviourally relevant instead of only trying to represent the natural posterior distribution optimally.

In the results section they then show that under this model the approximated posterior can (and will) indeed put more probability mass on states with larger utility, something which has apparently been found in grasshoppers. Additionally they show that increasing the cost of spikes results in smaller firing rates which, as they argue, leads to response latencies as seen in experiments. Finally, they show that under the assumption that high utility or very costly states are rare, the model will automatically account for the nonlinear weighting of probabilities in risky choice observed in humans. The model therefore explains this irrational behaviour by noting that “under neural resource constraints, the approximate density will be biased towards high reward regions of the state space.”

I don’t know enough to judge the correspondence of the model behaviour/predictions with the experimental results, or whether the model contradicts some other results. However, the paper is quite inspiring in the sense that it presents an intuitive idea which potentially has big implications for how populations of neurons code probability distributions, namely that the neuronal codes are influenced as much by expected rewards as by the natural distribution. Of course, the paper leaves many questions open. The authors only show results for when the approximate distributions are optimised alone, but what happens when actions and distribution are optimised simultaneously? What are the timescales of the distribution optimisation? Is it really instantanious (on the same timescale as action selection) as the authors indicate, or is it rather a slower process? Their proposal also has the potential to explain how the dimensionality of the state space can be reduced by only considering states which are behaviourally relevant. However, it remains unclear to what extent this specialisation should be implemented. In other words, is the posterior dependent, e.g., on the precise goal in the task, or is it rather only dependent on the selected task? Especially the cost of spiking example suggests a connection between the proposed mechanism and attention. Can attention be explained by this low-level description of biased representations of the posterior distribution?

The paper is quite inspiring and you kind of wonder why nobody has made these ideas explicit before, or maybe somebody has? Actually Maneesh Sahani had a NIPS paper in 2004 which they cite, but not comment on, and which looks very similar to what they do here (from the abstract).