## Normative evidence accumulation in unpredictable environments.

Glaze, C. M., Kable, J. W., and Gold, J. I.
Elife, 4, 2015

### Abstract

In our dynamic world, decisions about noisy stimuli can require temporal accumulation of evidence to identify steady signals; differentiation to detect unpredictable changes in those signals; or both. Normative models can account for learning in these environments but have not yet been applied to faster decision processes. We present a novel, normative formulation of adaptive learning models that forms decisions by acting as a leaky accumulator with non-absorbing bounds. These dynamics, derived for both discrete and continuous cases, depend on the expected rate of change of the statistics of the evidence and balance signal identification and change detection. We found that, for two different tasks, human subjects learned these expectations, albeit imperfectly, then used them to make decisions in accordance with the normative model. The results represent a unified, empirically supported account of decision-making in unpredictable environments that provides new insights into the expectation-driven dynamics of the underlying neural signals.

### Review

The authors suggest a model of sequential information processing that is aware of possible switches in the underlying source of information. They further show that the model fits responses of people in two perceptual decision making tasks and consequently argue that behaviour, which was previously considered to be suboptimal, may follow the normative, i.e., optimal, mechanism of the model. This mechanism postulates that typical evidence accumulation mechanisms in perceptual decision making are altered by the expected switch rate of the stimulus. Specifically, evidence accumulation becomes more leaky and a non-absorbing bound becomes lower when the expected switch rate increases. The paper is generally well-written (although there are some convoluted bits in the results section) and convincing. I was a bit surprised, though, that only choices, but not their timing is considered in the analysis with the model. In the following I’ll go through some more details of the model and discuss limitations of the presented models and their relation to other models in the field, but first I describe the experiments reported in the paper.

The paper reports two experiments. In the first (triangles task) people saw two triangles on the screen and had to judge whether a single dot was more likely to originate from the one triangle or the other. There was one dot and corresponding response per trial. In each trial the position of the dot was redrawn from a Gaussian distribution centred around one of the two triangles. There were also change point trials in which the triangle from which the dot was drawn switched (and then remained the same until the next change point). The authors analysed the proportion correct in relation to whether a trial was a change point. Trials were grouped into blocks which were defined by constant rate of switches (hazard rate) in the true originating triangle. In the second experiment (dots-reversal task), a random dot stimulus repeatedly switched (reversed) direction within a trial. In each trial people had to tell in which direction the dots moved before they vanished. The authors analysed the proportion correct in relation to the time between the last switch and the end of stimulus presentation. There were no blocks. Each trial had one of two hazard rates and one of two difficulty levels. The two difficulty levels were determined for each subject individually such that the more difficult one lead to correct identification of motion direction of a 500ms long stimulus in 65% of cases.

The authors present two normative models, one discrete and one continuous, which they apply across and within trial in the triangles and dots-reversal tasks, respectively. The discrete model is a simple hidden Markov model in which the hidden state can take one of two values and there is a common transition probability between these two values which they call hazard ‘rate’ (H). Observations were implicitly assumed Gaussian. They only enter during fitting as log-likelihood ratios in the form $$\beta*x_n$$ where beta is a scaling relating to the internal / sensory uncertainty associated with the generative model of observations and $$x_n$$ is the observed dot position (x-coordinate) in the triangles task. In methods, the authors derive the update equation for the log posterior odds ($$L_n$$) of the hidden state values given in Eqs. (1) and (2).

The continuous model is based on a Markov jump process with two states which is the continuous equivalent of the hidden Markov model above. Using Ito-calculus the authors again derive an update equation for the log posterior odds of the two states (Eq. 4), but during fitting they actually approximate Eq. (4) with the discrete Eq. (1), because it is supposedly the most efficient discrete-time approximation of Eq. (4) (no explanation for why this is the case was given). They just replace the log-likelihood ratio placeholder (LLR) with a coherence-dependent term applicable to the random dot motion stimulus. Notably, in contrast to standard drift-diffusion modelling of random dot motion tasks, the authors used coherence-dependent noise. I’d be interested in the reason for this choice.

There is an apparent fundamental difference between the discrete and continuous models which can be seen in Fig. 1 B vs C. In the discrete model, for H>0.5, the log posterior odds may actually switch sign from one observation to the next whereas this cannot happen in the continuous model. Conceptually, this means that the log posterior odds in the discrete model, when the LLR is 0, i.e., when there is no evidence in either direction, would oscillate between decreasing positive and increasing negative values until converging to 0. This oscillation can be seen in Fig. 2G, red line for |LLR|>0. In the continuous model such an oscillation cannot happen, because the infinitely many, tiny time steps allow the model to converge to 0 before switching the sign. Another way to see this is through the discrete hazard ‘rate’ H which is the probability of a sign reversal within one time step of size dt. When you want to decrease dt in the model, but want to maintain a given rate of sign reversals in, e.g., 1 second, H would also have to decrease. Consequently, when dt approaches 0, the probability of a sign reversal approaches 0, too, which means that H is a useless parameter in continuous time which, in turn, is the reason why it is replaced by a real rate parameter ($$\lambda$$) representing the expected number of reversals per second. In conclusion, the fundamental difference between discrete and continuous models is only an apparent one. They are very similar models, just expressed in different resolutions of time. In that sense it would have perhaps been better to present results in the paper consistently in terms of a real hazard rate ($$\lambda$$) which could be obtained in the triangles task by dividing H by the average duration of a trial in seconds. Notice that the discrete model represents all hazard rates $$\lambda>1/dt$$ as H=1, i.e., it cannot represent hazard rates which would lead to more than 1 expected sign reversal per $$dt$$. There may be more subtle differences between the models when the exact distributions of sign reversals are considered instead of only the expected rates.

Using first order approximations of the two models the authors identify two components in the dynamics of the log posterior odds L: a leak and a bias. [Side remark: there is a small sign mistake in the definition of leak k of the continuous model in the Methods section.] Both depend on hazard rate and the authors show that the leak dominates the dynamics for small L whereas the bias dominates for large L. I find this denomination a bit misleading, because both, leak and bias, effectively result in a leak of log-posterior odds L by reducing L in every time step (cf. Fig. 1B,C). The change from a multiplicative leak to one based on a bias just means that the effective amount of leak in L increases nonlinearly with L as the bias takes over.

To test whether this special form of leak underlies decision making the authors compared the full model to two versions which only had a multiplicative leak, or one based on bias. In the former the leak stayed constant for increasing L, i.e., $$L’ = \gamma*L$$. In the latter there was perfect accumulation without leak up to the bias and then a bias-based leak which corresponds to a multiplicative leak where the leak rate increased with L such that $$L’ = \gamma(L)*L$$ with $$\gamma(L) = bias / L$$. The authors report evidence that in both tasks both alternative models do not describe choice behaviour as well as the full, normative model. In Fig. 9 they provide a reason by estimating the effective leak rate in the data and the models in dependence on the strength of sensory evidence (coherence in the dots reversal task). They do this by fitting the model with multiplicative leak separately to trials with low and high coherence (fitting to choices in the data or predicted by the different fitted models). In both data and normative model the effective leak rates depended on coherence. This dependence arises, because high sensory evidence leads to large values of L and I have argued above that larger L has larger effective leak rate due to the bias. It is, therefore, not surprising that the alternative model with multiplicative leak shows no dependence of effective leak on coherence. But it is also not surprising that the alternative model with bias-based leak has a larger dependence of effective leak on coherence than the data, because this model jumps from no leak to very large leak when coherence jumps from low to high. The full, normative model lies in between, because it smoothly transitions between the two alternative models.

Why is there a leak in the first place? Other people have found no evidence for a leak in evidence accumulation (eg. Brunton et al., 2013). The leak results from the possibility of a switch of the source of the observations, i.e., a switch of the underlying true stimulus. Without any information, i.e., without observations the possibility of a switch means that you should become more uncertain about the stimulus as time passes. The larger the hazard rate, i.e., the larger the probability of a switch within some time window, the faster you should become uncertain about the current stimulus. For a log posterior odds of L=0 uncertainty is at its maximum (both stimuli have equal posterior probability). This is another reason why discrete hazard ‘rates’ H>0.5 which lead to sign reversals in L do not make much sense. The absence of evidence for one stimulus should not lead to evidence for the other stimulus. Anyway, as the hazard rate goes to 0 the leak will go to 0 such that in experiments where usually no switches in stimulus occur subjects should not exhibit a leak which explains why we often find no evidence for leaks in typical perceptual decision making experiments. This does not mean that there is no leak, though. Especially, the authors report here that hazard rates estimated from behaviour of subjects (subjective) tended to be a bit higher than the ones used to generate the stimuli (objective), when the objective hazard rates were very low and the other way around for high objective hazard rates. This indicates that people have some prior expectations towards intermediate hazard rates that biased their estimates of hazard rates in the experiment.

The discussed forms of leak implement a property of the model that the authors called a ‘non-absorbing bound’. I find this wording also a bit misleading, because ‘bound’ was usually used to indicate a threshold in drift diffusion models which, when reached, would trigger a response. The bound here triggers nothing. Rather, it represents an asymptote of the average log posterior odds. Thus, it’s not an absolute bound, but it’s often passed due to variance in the momentary sensory evidence (LLR). I can also not follow the authors when they write: “The stabilizing boundary is also in contrast to the asymptote in leaky accumulation, which increases linearly with the strength of evidence”. Based on the dynamics of L discussed above the ‘bound’ here should exhibit exactly the described behaviour of an asymptote in leaky accumulation. The strength of evidence is reflected in the magnitude of LLR which is added to the intrinsic dynamics of the log posterior odds L. The non-absorbing bound, therefore, should be given by bias + average of LLR for the current stimulus. The bound, thus, should rise linearly with the strength of evidence (LLR).

Fitting of the discrete and continuous models was done by maximising the likelihood of the models (in some fits with many parameters, priors over parameters were used to regularise the optimisation). The likelihood in the discrete models was Gaussian with mean equal to the log posterior odds ($$L_n$$) computed from the actual dot positions $$x_n$$. The variance of the Gaussian likelihood was fitted to the data as a free parameter. In the continuous model the likelihood was numerically approximated by simulating the discretised evolution of the probabilities that the log posterior odds take on particular values. This is very similar to the approach used by Brunton2013. The distribution of the log posterior odds $$L_n$$ was considered here, because the stream of sensory observations $$x(t)$$ was unknown and therefore had to enter as a random variable while in the triangles task $$x(t)=x_n$$ was set to the known x-coordinates of the presented dots.

The authors argued that the fits of behaviour were good, but at least for the dots reversal task Fig. 8 suggests otherwise. For example, Fig. 8G shows that 6 out of 12 subjects (there were supposed to be 13, but I can only see 12 in the plots) made 100% errors in trials with the low hazard rate of 0.1Hz and low coherence where the last switch in stimulus was very recent (maximally 300ms before the end of stimulus presentation). The best fitting model, however, predicted error rates of at most 90% in these conditions. Furthermore, there is a significant difference in choice errors between the low and high hazard rate for large times after the last switch in stimulus (Fig. 8A, more errors for high hazard rate) which was not predicted by the fitted normative model. Despite these differences the fitted normative model seems to capture the overall patterns in the data.

#### Conclusion

The authors present an interesting normative model in discrete and continuous time that extends previous models of evidence accumulation to situations in which switches in the presented stimulus can be expected. In light of this model, a leak in evidence accumulation reflects a tendency to increase uncertainty about the stimulus due to a potentially upcoming switch in the stimulus. The model provides a mathematical relation between the precise type of leak and the expected switch (hazard) rate of the stimulus. In particular, and in contrast to previous models, the leak in the present model depends nonlinearly on the accumulated evidence. As the authors discuss, the presented normative model potentially unifies decision making processes observed in different situations characterised by different stabilities of the underlying stimuli. I had the impression that the authors were very thorough in their analysis. However, some deviations of model and data apparent in Fig. 8 suggest that either the model itself, or the fitting procedure may be improved such that the model better fits people’s behaviour in the dots-reversal task. It was anyway surprising to me that subjects only had to make a single response per trial in that task. This feels like a big waste of potential choice data when I consider that each trial was 5-10s long and contained several stimulus switches (reversals).

## A test of Bayesian observer models of processing in the Eriksen flanker task.

White, C. N., Brown, S., and Ratcliff, R.
J Exp Psychol Hum Percept Perform, 38:489–497, 2012

### Abstract

Two Bayesian observer models were recently proposed to account for data from the Eriksen flanker task, in which flanking items interfere with processing of a central target. One model assumes that interference stems from a perceptual bias to process nearby items as if they are compatible, and the other assumes that the interference is due to spatial uncertainty in the visual system (Yu, Dayan, & Cohen, 2009). Both models were shown to produce one aspect of the empirical data, the below-chance dip in accuracy for fast responses to incongruent trials. However, the models had not been fit to the full set of behavioral data from the flanker task, nor had they been contrasted with other models. The present study demonstrates that neither model can account for the behavioral data as well as a comparison spotlight-diffusion model. Both observer models missed key aspects of the data, challenging the validity of their underlying mechanisms. Analysis of a new hybrid model showed that the shortcomings of the observer models stem from their assumptions about visual processing, not the use of a Bayesian decision process.

### Review

This is a response to Yu2009 in which the authors show that Yu et al.'s main Bayesian models cannot account for the full data of an Eriksen flanker task. In particular, Yu et al.'s models predict a far too high overall error rate with the suggested parameter settings that reproduce the inital drop of accuracy below chance level for very fast responses. The argument put forward by White et al. is that the mechanisms used in Yu et al.'s models to overcome initial, flanker-induced biases is too slow, i.e., the probabilistic evidence accumulation implemented by the models is influenced by the flankers for too long. White et al's shrinking spotlight models do not have such a problem, mostly because the speed with which flankers loose influence is fitted to the data. The argument seems compelling, but I would like to understand better why it takes so long in the Bayesian model to overcome flanker influence and whether there are other ways of speeding this up than the one suggested by White et al..

## Neural correlates of perceptual decision making before, during, and after decision commitment in monkey frontal eye field.

Ding, L. and Gold, J. I.
Cereb Cortex, 22:1052–1067, 2012

### Abstract

Perceptual decision making requires a complex set of computations to implement, evaluate, and adjust the conversion of sensory input into a categorical judgment. Little is known about how the specific underlying computations are distributed across and within different brain regions. Using a reaction-time (RT) motion direction-discrimination task, we show that a unique combination of decision-related signals is represented in monkey frontal eye field (FEF). Some responses were modulated by choice, motion strength, and RT, consistent with a temporal accumulation of sensory evidence. These responses converged to a threshold level prior to behavioral responses, reflecting decision commitment. Other responses continued to be modulated by motion strength even after decision commitment, possibly providing a memory trace to help evaluate and adjust the decision process with respect to rewarding outcomes. Both response types were encoded by FEF neurons with both narrow- and broad-spike waveforms, presumably corresponding to inhibitory interneurons and excitatory pyramidal neurons, respectively, and with diverse visual, visuomotor, and motor properties, albeit with different frequencies. Thus, neurons throughout FEF appear to make multiple contributions to decision making that only partially overlap with contributions from other brain regions. These results help to constrain how networks of brain regions interact to generate perceptual decisions.

### Review

This paper puts some perspective in the usually communicated statement that LIP neurons are responsible for perceptual decision making in monkeys who perform a reaction time motion discrimination task. Especially, the authors report on neurons in frontal eye field (FEF) that also show typical accumulation-to-bound responses. Furthermore, at least as many neurons in FEF exhibited activity that was correlated with motion coherence and choice during and after the saccade indicating a choice and extinguishing the stimulus, i.e., the activity of these neurons appeared to accumulate evidence, but seemed to ignore the supposed bound and maintained a representation of the stimulus after it had gone. In the discussion the authors also point to other studies which found activity that can be interpreted in terms of evidence accumulation. Corresponding neurons have been found in LIP, FEF, superior colliculus (SC) and caudate nucleus of which neurons in LIP and SC may be mostly governed by a bound. From the reported and reviewed results it becomes clear that, although accumulation-to-bound may be an important component of perceptual decision making, it is not sufficient to explain the wide variety of decision-related neuronal activity in the brain. In particular, it is unclear how neurons from the mentioned brain regions interact and what their different roles in perceptual decision making are.

## Effects of cortical microstimulation on confidence in a perceptual decision.

Fetsch, C. R., Kiani, R., Newsome, W. T., and Shadlen, M. N.
Neuron, 83:797–804, 2014

### Abstract

Decisions are often associated with a degree of certainty, or confidence-an estimate of the probability that the chosen option will be correct. Recent neurophysiological results suggest that the central processing of evidence leading to a perceptual decision also establishes a level of confidence. Here we provide a causal test of this hypothesis by electrically stimulating areas of the visual cortex involved in motion perception. Monkeys discriminated the direction of motion in a noisy display and were sometimes allowed to opt out of the direction choice if their confidence was low. Microstimulation did not reduce overall confidence in the decision but instead altered confidence in a manner that mimicked a change in visual motion, plus a small increase in sensory noise. The results suggest that the same sensory neural signals support choice, reaction time, and confidence in a decision and that artificial manipulation of these signals preserves the quantitative relationship between accumulated evidence and confidence.

### Review

The paper provides verification of beliefs asserted in Kiani2009: Confidence is directly linked to accumulated evidence as represented in monkey area LIP during a random dot motion discrimination task. The authors use exactly the same task, but now stimulate patches of MT/MST neurons instead of recording single LIP neurons and resort to analysing behavioural data only. They find that small microstimulation of functionally well-defined neurons, that signal a particular motion direction, affects decisions in the same way as manipulating the motion information in the stimulus directly. This was expected, because it has been shown before that stimulating MT neurons influences decisions in that way. New here is that the effect of stimulation on confidence judgements was evaluated at the same time. The rather humdrum result: confidence judgements are also affected in the same way. The authors argue that this didn’t have to be, because confidence judgements are thought to be a metacognitive process that may be influenced by other high-level cognitive functions such as related to motivation. Then again, isn’t decision making thought to be a high-level cognitive function that is clearly influenced by motivation?

Anyway, there was one small effect particular to stimulation that did not occur in the control experiment where the stimulus itself was manipulated: There was a slight decrease in the overall proportion of sure-bet choices (presumably indicating low confidence) with stimulation suggesting that monkeys were more confident when stimulated. The authors explain this with larger noise (diffusion) in a simple drift-diffusion model. Counterintuitively, the larger accumulation noise increases the probability of moving away from the initial value and out of the low-confidence region. The mechanism makes sense, but I would rather explain it within an equivalent Bayesian model in which MT neurons represent noisy observations that are transformed into noisy pieces of evidence which are accumulated in LIP. Stimulation increases the noise on the observations which in turn increases accumulation noise in the equivalent drift-diffusion model (see Bitzer et al., 2014).

In drift-diffusion models drift, diffusion and threshold are mutually redundant in that one of them needs to be fixed when fitting the model to choices and reaction times. The authors here let all of them vary simultaneously which indicates that the parameters can be discriminated based on confidence judgements even when no reaction time is taken into account. This should be followed up. It is also interesting to think about how the postulated tight link between the ‘decision variable’ and the experienced confidence can be consolidated in a reaction time task where supposedly all decisions are made at the same threshold value. Notice that the confidence of a decision in their framework depends on the state of the diffusion (most likely one of the two boundaries) and the time of the decision: Assuming fixed noise, smaller decision times should translate into larger confidence, because you assume that this is due to a larger drift. Therefore, you should see variability of confidence judgements in a reaction time task that is strongly correlated with reaction times.

## Decision-related activity in sensory neurons reflects more than a neuron's causal effect.

Nienborg, H. and Cumming, B. G.
Nature, 459:89–92, 2009

### Abstract

During perceptual decisions, the activity of sensory neurons correlates with a subject’s percept, even when the physical stimulus is identical. The origin of this correlation is unknown. Current theory proposes a causal effect of noise in sensory neurons on perceptual decisions, but the correlation could result from different brain states associated with the perceptual choice (a top-down explanation). These two schemes have very different implications for the role of sensory neurons in forming decisions. Here we use white-noise analysis to measure tuning functions of V2 neurons associated with choice and simultaneously measure how the variation in the stimulus affects the subjects’ (two macaques) perceptual decisions. In causal models, stronger effects of the stimulus upon decisions, mediated by sensory neurons, are associated with stronger choice-related activity. However, we find that over the time course of the trial these measures change in different directions-at odds with causal models. An analysis of the effect of reward size also supports this conclusion. Finally, we find that choice is associated with changes in neuronal gain that are incompatible with causal models. All three results are readily explained if choice is associated with changes in neuronal gain caused by top-down phenomena that closely resemble attention. We conclude that top-down processes contribute to choice-related activity. Thus, even forming simple sensory decisions involves complex interactions between cognitive processes and sensory neurons.

### Review

They investigated the source of the choice probability of early sensory neurons. Choice probability quantifies the difference in firing rate distributions separated by the behavioural response of the subject. The less overlap between the firing rate distributions for one response and its alternative (in two-choice tasks), the greater the choice probability. Importantly, they restricted their analysis to trials in which the stimulus was effectively random. In random dot motion experiments this corresponds to 0% coherent motion, but here they used a disparity discrimination task and looked at disparity selective neurons in macaque area V2. The mean contribution from the stimulus, therefore, should have been 0. Yet, they found that choice probability was above 0.5 indicating that the firing of the neurons still could predict the final response, but why? They consider two possibilities: 1) the particular noise in firing rates of sensory neurons causes, at least partially, the final choice. 2) The firing rate of sensory neurons reflects choice-related effects induced by top-down influences from more decision-related areas.

Note that the choice probability they use is somewhat corrected for influences from the stimulus by considering the firing rate of a neuron in response to a particular disparity, but without taking choices into account. This correction reduced choice probabilities a bit. Nevertheless, they remained significantly above 0.5. This result indicates that the firing rate distributions of the recorded neurons were only little affected by which disparities were shown in individual frames when these distributions are defined depending on the final choice. I don’t find this surprising, because there was no consistent stimulus to detect from the random disparities and the behavioural choices were effectively random.

Yet, the particular disparities presented in individual trials had an influence on the final choice. They used psychophysical reverse correlation to determine this. The analysis suggests that the very first frames had a very small effect which is followed by a steep rise in influence of frames at the beginning of a trial (until about 200ms) and then a steady decline. This result can mean different things depending on whether you believe that evidence accumulation stops once you have reached a threshold, or whether evidence accumulation continues until you are required to make a response. Shadlen is probably a proponent of the first proposition. Then, the decreasing influence of the stimulus on the choice just reflects the smaller number of trials in which the threshold hasn’t been reached, yet. Based on the second proposition, the result means that the weight of individual pieces of evidence during accumulation reduces as you come closer to the response. Currently, I can’t think of decisive evidence for either proposition, but it has been shown in perturbation experiments that stimulus perturbations close to a decision, late in a trial had smaller effects on final choices than perturbations early in a trial (Huk and Shadlen, 2005).

Back to the source of above chance-level choice probabilities. The authors argue, given the decreasing influence of the stimulus on the final choice and assuming that the influence of the stimulus on sensory neurons stays constant, that choice probabilities should also decrease towards the end of a trial. However, choice probabilities stay roughly constant after an initial rise. Consequently, they infer that the firing of the neurons must be influenced from other sources, apart from the stimulus, which are correlated with the choice. They consider two of these sources: i) Lateral, sensory neurons which could reflect the final decision better. ii) Higher, decision related areas which, for example, project a kind of bias onto the sensory neurons. The authors strongly prefer ii), also because they found that the firing of sensory neurons appears to be gain modulated when contrasting firing rates between final choices. In particular, firing rates showed a larger gain (steeper disparity tuning curve of neuron) when trials were considered which ended with the behavioural choice corresponding to the preferred dispartiy of the neuron. In other words, the output of a neuron was selectively increased, if that neuron preferred the disparity which was finally chosen. Equivalently, the output of a neuron was selectively decreased, if that neuron preferred a different disparity than the one which was finally chosen. This gain difference explains at least part of the difference in firing rate distributions which the choice probability measures.

They also show an interesting effect of reward size on the correlation between stimulus and final choice: Stimulus had larger influence on choice for larger reward. Again, if the choice probabilities were mainly driven by stimulus, bottom-up related effects and the stimulus had a larger influence on final choice in high reward trials, then choice probabilities should have been higher in high reward trials. The opposite was the case: choice probabilities were lower in high reward trials. The authors explain this using the previous bias hypothesis: The measured choice probabilities reflect something like an attentional gain or bias induced by higher-level decision-related areas. As the stimulus becomes more important, the bias looses influence. Hence, the choice probabilities reduce.

In summary, the authors present convincing evidence that already sensory neurons in early visual cortex (V2) receive top-down, decision-related influences. Compared with a previous paper (Nienborg and Cumming, 2006) the reported choice probabilities here were quite similar to those reported there, even though here only trials with complete random stimuli were considered. I would have guessed that choice probabilities would be considerably higher for trials with an actually presented stimulus. Why is there only a moderate difference? Perhaps there actually isn’t. My observation is only based on a brief look at the figures in the two papers.

## Probabilistic reasoning by neurons.

Yang, T. and Shadlen, M. N.
Nature, 447:1075–1080, 2007

### Abstract

Our brains allow us to reason about alternatives and to make choices that are likely to pay off. Often there is no one correct answer, but instead one that is favoured simply because it is more likely to lead to reward. A variety of probabilistic classification tasks probe the covert strategies that humans use to decide among alternatives based on evidence that bears only probabilistically on outcome. Here we show that rhesus monkeys can also achieve such reasoning. We have trained two monkeys to choose between a pair of coloured targets after viewing four shapes, shown sequentially, that governed the probability that one of the targets would furnish reward. Monkeys learned to combine probabilistic information from the shape combinations. Moreover, neurons in the parietal cortex reveal the addition and subtraction of probabilistic quantities that underlie decision-making on this task.

### Review

The authors argue that the brain reasons probabilistically, because they find that single neuron responses (firing rates) correlate with a measure of probabilistic evidence derived from the probabilistic task setup. It is certainly true that the monkeys could learn the task (a variant of the weather prediction task) and I also find the evidence presented in the paper generally compelling, but the authors note themselves that similar correlations with firing rate may result from other quantitative measures with similar properties as the one considered here. May, for example, firing rates correlate similarly with a measure of expected value of a shape combination as derived from a reinforcement learning model?

What did they do in detail? They trained monkeys on a task in which they had to predict which of two targets will be rewarded based on a set of four shapes presented on the screen. Each shape contributed a certain weight to the probability of rewarding a target as defined by the experimenters. The monkeys had to learn these weights. Then they also had to learn (implicitly) how the weights of shapes are combined to produce the probability of reward. After about 130,000 trials the monkeys were good enough to be tested. The trick in the experiment was that the four shapes were not presented simultaneously, but appeared one after the other. The question was whether neurons in lateral intraparietal (LIP) area of the monkeys’ brains would represent the updated probabilities of reward after addition of each new shape within a trial. That the neurons would do that was hypothesised, because results from previous experiments suggested (see Gold & Shalden, 2007 for review) that neurons in LIP represent accumulated evidence in a perceptual decision making paradigm.

Now Shadlen seems convinced that these neurons do not directly represent the relevant probabilities, but rather represent the log likelihood ratio (logLR) of one choice option over the other (see, e.g., Gold & Shadlen, 2001 and Shadlen et al., 2008). Hence, these ‘posterior’ probabilities play no role in the paper. Instead all results are obtained for the logLR. Funnily the task is defined solely in terms of the posterior probability of reward for a particular combination of four shapes and the logLR needs to be computed from the posterior probabilities (Yang & Shadlen don’t lay out this detail in the paper or the supplementary information). I’m more open about the representation of posterior probabilities directly and I wondered how the correlation with logLR would look like, if the firing rates would respresent posterior probabilities. This is easy to simulate in Matlab (see Yang2007.m). Such a simulation shows that, as a function of logLR, the firing rate (representing posterior probabilities) should follow a sigmoid function. Compare this prediction to Figures 2c and 3b for epoch 4. Such a sigmoidal relationship derives from the boundedness of the posterior probabilities which is obviously reflected in firing rates of neurons as they cannot drop or rise indefinitely. So there could be simple reasons for the boundedness of firing rates other than that they represent probabilities, but in any case it appears unlikely that they represent unbounded log likelihood ratios.

## The Cost of Accumulating Evidence in Perceptual Decision Making.

Drugowitsch, J., Moreno-Bote, R., Churchland, A. K., Shadlen, M. N., and Pouget, A.
The Journal of Neuroscience, 32:3612–3628, 2012

### Abstract

Decision making often involves the accumulation of information over time, but acquiring information typically comes at a cost. Little is known about the cost incurred by animals and humans for acquiring additional information from sensory variables due, for instance, to attentional efforts. Through a novel integration of diffusion models and dynamic programming, we were able to estimate the cost of making additional observations per unit of time from two monkeys and six humans in a reaction time (RT) random-dot motion discrimination task. Surprisingly, we find that the cost is neither zero nor constant over time, but for the animals and humans features a brief period in which it is constant but increases thereafter. In addition, we show that our theory accurately matches the observed reaction time distributions for each stimulus condition, the time-dependent choice accuracy both conditional on stimulus strength and independent of it, and choice accuracy and mean reaction times as a function of stimulus strength. The theory also correctly predicts that urgency signals in the brain should be independent of the difficulty, or stimulus strength, at each trial.

### Review

The authors show equivalence between a probabilistic and a diffusion model of perceptual decision making and consequently explain experimentally observed behaviour in the random dot motion task in terms of varying bounds in the diffusion model which correspond to varying costs in the probabilistic model. Here, I discuss their model in detail and outline its limits. My main worry with the presented model is that it may be too powerful to have real explanatory power. Impatient readers may want to skip to the conclusion below.

Perceptual model

The presented model is tailored to the two-alternative, forced choice random dot motion task. The fundamental assumption for the model is that at each point in discrete time, or equivalently, for each successive time period in continuous time the perceptual process of the decision maker produces an independent sample of evidence whose mean, mu*dt, reflects the strength (coherence) and direction (only through sign of evidence) of random dot motion while its variance, sigma2, reflects the passage of time (sigma2 = dt, the time period between observations). This definition of input to the decision model as independent samples of motion strength in either one of two (unspecified) directions restricts the model to two decision alternatives. Consequently, the presented model does not apply to more alternatives, or dependent samples.

The model of noisy, momentary evidence corresponds to a Wiener process with drift which is exactly what standard (drift) diffusion models of perceptual decision making are where drift is equal to mu and diffusion is equal to sigma2. You could wonder why sigma2 is exactly equal to dt and not larger, or smaller, but this is controlled by setting the mean evidence mu to an appropriate level by allowing it to scale: mu = k*c, where k is an arbitrary scaling constant which is fit to data and c is the random dot coherence in the current trial. Therefore, by controlling k you essentially control the signal to noise ratio in the model of the experiment and you would get equivalent results, if you changed sigma2 while fixing mu = c. The difference between the two cases is purely conceptual: In the former case you assume that the neuronal population in MT signals, on average, a scaled motion strength where the scaling may be different for different subjects, but signal variance is the same over subjects while in the latter case you assume that the MT signal, on average, corresponds to motion strength directly, but MT signal variance varies across subjects. Personally, I prefer the latter.

The decision circuit in the author’s model takes the samples of momentary evidence as described above and computes a posterior belief over the two considered alternatives (motion directions). This posterior belief depends on the posterior probability distribution over mean motion strengths mu which is computed from the samples of momentary evidence taking a prior distribution over motion strengths into account. An important assumption in the computation of the posterior is that the decision maker (or decision circuit) has a perfect model of how the samples of momentary evidence are generated (a Gaussian with mean mu*dt and variance dt). If, for example, the decision maker would assume a slightly different variance, that would also explain differences in mean accuracy and decision times. The assumption of the perfect model, however, allows the authors to assert that the experimentally observed fraction of correct choices at a time t is equal to the internal belief of the decision maker (subject) that the chosen alternative is the correct one. This is important, because only with an estimate of this internal belief the authors can later infer the time-varying waiting costs for the subject (see below).

Anyway, under the given model the authors show that for a Gaussian prior you obtain a Gaussian posterior over motion strength mu (Eq. 4) and for a discrete prior you obtain a corresponding discrete posterior (Eq. 7). Importantly, the parameters of the posteriors can be formulated as functions of the current state x(t) of the sample-generating diffusion process and elapsed time t. Consequently, also the posterior belief over decision alternatives can be formulated as a one-to-one, i.e., invertible function of the diffusion state (and time t). By this connection, the authors have shown that, under an appropriate transformation, decisions based on the posterior belief are equivalent to decisions based on the (accumulated) diffusion state x(t) set in relation to elapsed time t.

In summary, the probabilistic perceptual decision model of the authors simply estimates the motion strength from the samples and then decides whether the estimate is positive or negative. Furthermore, this procedure is equivalent to accumulating the samples and deciding whether the accumulated state is very positive or very negative (as determined by hitting a bound). The described diffusion model has been used before to fit accuracies and mean reaction times of subjects, but apparently it was never quite good in fitting the full reaction time distribution (note that it lacks the extensions of the drift diffusion models suggested by Ratcliff, see, e.g., [1]). So here the authors extend the diffusion model by adding time-varying bounds which can be interpreted in the probabilistic model as a time-varying cost of waiting for more samples.

Time-varying bounds and costs

Intuitively, introducing a time-varying bound in a diffusion model introduces great flexibility in shaping the response accuracy and timing at any given time point. However, I currently do not have a good idea of just how flexible the model becomes. For example, if in discrete time changing the bound at each time step could independently modify the accuracy and reaction time distribution at this time step, the bound alone could explain the data. I don’t believe that this extreme case is true, but I would like to know how close you would come. In any case, it appears to be sensible to restrict how much the bound can vary to prevent overfitting of the data, or indeed to prevent making the other model parameters obsolete. In the present paper, the authors control the shape of the bound by using a function made of cosine basis functions. Although this restricts the bound to be a smooth function of time, it still allows considerable flexibility. The authors use two more approaches to control the flexibility of the bound. One is to constrain the bound to be the same for all coherences, meaning that it cannot be used to explain differences between coherences (experimental conditions). The other is to use Bayesian methods for fitting the data. On the one hand, this controls the bound by choosing particular priors. They do this by only considering parameter values in a restricted range, but I do not know how wide or narrow this range is in practice. On the other hand, the Bayesian approach leads to posterior distributions over parameters which means that subsequent analyses can take the uncertainty over parameters into account (see, e.g., the indicated uncertainty over the inferred bound in Fig. 5A). Although I remain with some last doubts about whether the bound was too flexible, I believe that this is not a big issue here.

It is, however, a different question whether the time-varying bound is a good explanation for the observed behaviour in contrast, e.g., to the extensions of the diffusion model introduced by Ratcliff (mostly trial-by-trial parameter variability). There, one might refer to the second, decision-related part of the presented model which considers the rewards and costs associated with decisions. In the Bayesian decision model presented in the paper the subject decides at each time step whether to select alternative 1, or alternative 2, or wait for more evidence in the next time step. This mechanism was already mentioned in [2]. Choosing an alternative will either lead to a reward (correct answer) or punishment (error), but waiting is also associated with a cost which may change throughout the trial. Deciding for the optimal course of action which maximises reward per unit time then is an average-reward reinforcement learning problem which the authors solve using dynamic programming. For a particular setting of reward, punishment and waiting costs this can be translated into an equivalent time-varying bound. More importantly, the procedure can be reversed such that the time-varying cost can be inferred from a bound that had been fitted to data. Apart from the bound, however, the estimate of the cost also depends on the reward/punishment setting and on an estimate of choice accuracy at each time step. Note that the latter differs considerably from the overall accuracy which is usually used to fit diffusion models and requires more data, especially when the error rate is low.

The Bayesian decision model, therefore, allows to translate the time-varying bound to a time-varying cost which then provides an explanation of the particular shape of the reaction time distribution (and accuracy) in terms of the intrinsic motivation (negative cost) of the subject to wait for more evidence. Notice that this intrinsic motivation is really just a value describing how much somebody (dis-)likes to wait and it cannot be interpreted in terms of trying to be better in the task anymore, because all these components have been taken care of by other parts of the decision model. So what does it mean when a subject likes to wait for new evidence just for the sake of it (cf. dip in cost at beginning of trial in human data in Fig. 8)? I don’t know.

Collapsing bounds as found from behavioural data in this paper have been associated with an urgency signal in neural data which drives firing rates of all decision neurons towards a bound at the end of a trial irrespective of the input / evidence. This has been interpreted as a response of the subjects to the approaching deadline (end of trial) that they do not want to miss. The explanation in terms of a waiting cost which rises towards the end of a trial suggests that subjects just have a built-in desire to make (potentially arbitrary) choices before a deadline. To me, this is rather unintuitive. If you’re not punished for making a wrong choice (blue lines in Figs. 7 and 8, but note that there was a small time-punishment in the human experiment) shouldn’t it be always beneficial to make a choice before the deadline, because you trade uncertain reward against certain no reward? This would already be able to explain the urgency signal without consideration of a waiting cost. So why do we see one anyway? It may just all depend on the particular setting of reward and punishment for correct choices and errors, respectively. The authors present different inferred waiting costs with varying amounts of punishment and argue that the results are qualitatively equal, but the three different values of punishment they present hardly exhaust the range of values that could be assumed. Also, they did not vary the amount of reward given for correct choices, but it is likely that only the difference between reward and punishment determines the behaviour of the model such that it doesn’t matter whether you change reward or punishment to explore model predictions.

Conclusion

The main contribution of the paper is to show that accuracy and reaction time distribution can be explained by a time-varying bound in a simple diffusion model in which the drift scales linearly with stimulus intensity (coherence in random dot motion). I tried to point out that this result may not be surprising depending on how much flexibility a time-varying bound adds to the model. Additionally, the authors present a connection between diffusion and Bayesian models of perceptual decision making which allows them to reinterpret the time-varying bounds in terms of the subjective cost of waiting for more evidence to arrive. The authors argue that this cost increases towards the end of a trial, but for two reasons I’m not entirely convinced: 1) Conceptually, it is worth considering the origin of a possible waiting cost. It could correspond to the energetic cost of keeping the inference machinery running and the attention on the task, but there is no reason why this should increase towards a deadline. 2) I’m not convinced by the presented results that the inferred increase of cost towards a deadline is qualitatively independent of the reward/punishment setting. A greater range of punishments should have been tested. Note that you cannot infer the rewards for decisions and the time-varying waiting cost at the same time from the behavioural data. So this issue cannot be settled without some new experiments which measure rewards or costs more directly. Finally, I miss an overview of fitted parameter values in the paper. For example, I would be interested in the inferred lapse trial probabilities p1. The authors go through great lengths to estimate the posterior distributions over diffusion model parameters and I wonder why they don’t share the results with us (at least mean and variance for a start).

In conclusion, the authors follow a trend to explain behaviour in terms of Bayesian ideal observer models extended by flexible cost functions and apply this idea to perceptual decision making via a detour through a diffusion model. Although I appreciate the sound work presented in the paper, I’m worried that the time-varying bound/cost is too flexible and acts as a kind of ‘get out of jail free’ card which blocks the view to other, potentially additional mechanisms underlying the observed behaviour.

References

[1] Bogacz, R.; Brown, E.; Moehlis, J.; Holmes, P. & Cohen, J. D. The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. Psychol Rev, 2006, 113, 700-765

[2] Dayan, P. & Daw, N. D. Decision theory, reinforcement learning, and the brain. Cogn Affect Behav Neurosci, 2008, 8, 429-453

## Representation of confidence associated with a decision by neurons in the parietal cortex.

Kiani, R. and Shadlen, M. N.
Science, 324:759–764, 2009

### Abstract

The degree of confidence in a decision provides a graded and probabilistic assessment of expected outcome. Although neural mechanisms of perceptual decisions have been studied extensively in primates, little is known about the mechanisms underlying choice certainty. We have shown that the same neurons that represent formation of a decision encode certainty about the decision. Rhesus monkeys made decisions about the direction of moving random dots, spanning a range of difficulties. They were rewarded for correct decisions. On some trials, after viewing the stimulus, the monkeys could opt out of the direction decision for a small but certain reward. Monkeys exercised this option in a manner that revealed their degree of certainty. Neurons in parietal cortex represented formation of the direction decision and the degree of certainty underlying the decision to opt out.

### Review

The authors used a 2AFC-task with an option to waive the decision in favour of a choice which provides low, but certain reward (the sure option) to investigate the representation of confidence in LIP neurons. Behaviourally the sure option had the expected effect: it was increasingly chosen the harder the decisions were, i.e., the more likely a false response was. Trials in which the sure option was chosen, thus, may be interpreted as those in which the subject was little confident in the upcoming decision. It is important to note that task difficulty here was manipulated by providing limited amounts of information for a limited amount of time, i.e., this was not a reaction time task.

The firing rates of the recorded LIP neurons indicate that selection of the sure option is associated with an intermediate level of activity compared to that of subsequent choices of the actual decision options. For individual trials the authors found that firing rates closer to the mean firing rate (in a short time period before the sure option became available) more frequently lead to selection of the sure option than firing rates further away from the mean, but in absolute terms the activity in this time window could predict choice of the sure option only weakly (probability of 0.4). From these results the authors conclude that the LIP neurons which have previously been found to represent evidence accumulation also encode confidence in a decision. They suggest a simple drift-diffusion model with fixed diffusion parameter to explain the results. Additional to standard diffusion models they define confidence in terms of the log-posterior odds which they compute from the state of the drift-diffusion model. They define posterior as p(S_i|v), the probability that decision option i is correct given that the drift-diffusion state (the decision variable) is v. They compute it from the corresponding likelihood p(v|S_i), but don’t state how they obtained that likelihood. Anyway, the sure option is chosen in the model, when the log-posterior odds is below a certain level. I don’t see why the detour via the log-posterior odds is necessary. You could directly define v as the posterior for decision option i and still be consistent with all the findings in the paper. Of course, then v could not be governed by a linear drift anymore, but why should it in the first place? The authors keenly promote the Bayesian brain, but stop just before the finishing line. Why?

## Robust averaging during perceptual judgment.

de Gardelle, V. and Summerfield, C.
Proc Natl Acad Sci U S A, 108:13341–13346, 2011

### Abstract

An optimal agent will base judgments on the strength and reliability of decision-relevant evidence. However, previous investigations of the computational mechanisms of perceptual judgments have focused on integration of the evidence mean (i.e., strength), and overlooked the contribution of evidence variance (i.e., reliability). Here, using a multielement averaging task, we show that human observers process heterogeneous decision-relevant evidence more slowly and less accurately, even when signal strength, signal-to-noise ratio, category uncertainty, and low-level perceptual variability are controlled for. Moreover, observers tend to exclude or downweight extreme samples of perceptual evidence, as a statistician might exclude an outlying data point. These phenomena are captured by a probabilistic optimal model in which observers integrate the log odds of each choice option. Robust averaging may have evolved to mitigate the influence of untrustworthy evidence in perceptual judgments.

### Review

The authors investigate what influence the variance of evidence has on perceptual decisions. A bit counterintuitively, they implement varying evidence by simultaneously presenting elements with different feature values (e.g. color) to subjects instead of presenting only one element which changes its feature value over time (would be my naive approach). Perhaps they did this to be able to assume constant evidence over time such that the standard drift diffusion model applies. My intuition is that subjects anyway implement a more sequential sampling of the stimulus display by varying attention to individual elements.

The behavioural results show that subjects take both mean presented evidence as well as the variance of evidence into account when making a decision: For larger mean evidence and smaller variance of evidence subjects are faster and make less mistakes. The results are attention dependent: mean and variance in a task-irrelevant feature dimension had no effect on responses.

The behavioural results can be explained by a drift diffusion model with a drift rate which takes the variance of the evidence into account. The authors present two such drift rates. 1) SNR drift = mean / standard deviation (as computed from trial-specific feature values). 2) LPR drift = mean log posterior ratio (also computed from trial-specific feature values). The two cannot be differentiated based on the measured mean RTs and error rates in the different conditions. So the authors provide an additional analysis which estimates the influence of the different presented elements, that is, the influence of the different feature values presented by them, on the given responses. This is done via a generalised linear regression by fitting a model which predicts response probabilites from presented feature values for individual trials. The fitted linear weights suggest that extreme (outlying) feature values have little influence on the final responses compared to the influence that (inlying) feature values close to the categorisation boundary have. Only the LPR model (2) replicates this effect.

Why have inlying feature values greater influence on responses than outlying ones in the LPR model, but not in the other models? The LPR model alone would not predict this, because for more extreme posterior values you get more extreme LPR values which then have a greater influence on the mean LPR value, i.e., the drift rate. Therefore, It is not entirely clear to me yet why they find a greater importance of inlying feature values in the generalised linear regression from feature values to responses. The best explanation I currently have is the influence of the estimated posterior values: Fig. S5 shows that the posterior values are constant for sufficiently outlying feature values and only change for inlying feature values, where the greatest change is at the feature value defining the categorisation boundary. When mapped through the LPR the posterior values lead to LPR values following the same sigmoidal form setting low and high feature values to constants. These constant high and low values may cancel each other out when, on average, they are equally many. Then, only the inlying feature values may have a lasting contribution on the LPR mean; especially those close to the categorisation boundary, because they tend to lead to larger variation in LPR values which may tip the LPR mean (drift rate) towards one of the two responses. This explanation means that the results depend on the estimated posterior values. In particular, that these are set to values of about 0.2, or 0.8, respectively, for a large range of extreme feature values.

I am unsure what conclusions can be drawn from the results. Although, the basic behavioural results are clear, it is not surprising that the responses of subjects depend on the variance of the presented evidence. You can define the feature values varying around the mean as noise. More variance then just means more noise and it is a basic result that people become slower and more error prone when presented with more noise. Perhaps surprisingly, it is here shown that this also works when noisy features are presented simultaneously on the screen instead of sequentially over time.

The DDM analysis shows that the drift rate of subjects decreases with increasing variance of evidence. This makes sense and means that subjects become more cautious in their judgements when confronted with larger variance (more noise). But I find the LPR model rather strange. It’s like pressing a Bayesian model into a mechanistic corset. The posterior ratio is an ad-hoc construct. Ok, it’s equivalent to the log-likelihood ratio, but why making it to a posterior ratio then? The vagueness arises already because of how the task is defined: all information is presented at once, but you want to describe accumulation of evidence over time. Consequently, you have to define some approximate, ad-hoc construct (mean LPR) which you can use to define the temporal integration. That the model based on that construct replicates an aspect of the behavioural data may be an artefact of the particular approximation used (apparently it is important that the estimated posterior values are constant for extreme feature values). So, it remains unclear to me whether an LPR-DDM is a good explanation for the involved processes in this case.

Actually, a large part of the paper (cf. title) concerns the finding that extreme feature values appear to have smaller influence on subject responses than feature values close to the categorisation boundary. This is surprising to me. Although it makes intuitive sense in terms of ‘robust averaging’, I wouldn’t predict it for optimal probabilistic integration of evidence, at least not without making further assumptions. Such assumptions are also implicit in the LPR-DDM and I’m a bit skeptical about it anyway. Thus, a good explanation is still needed, in my opinion. Finally, I wonder how reliable the generalised linear regression analysis, which led to these results, is. On the one hand, the authors report using two different generalised linear models and obtaining equivalent results. On the other hand, they estimate 9 parameters from only one binary response variable and I wonder how the optimisation landscape looks in this case.

## A supramodal accumulation-to-bound signal that determines perceptual decisions in humans.

O’Connell, R. G., Dockree, P. M., and Kelly, S. P.
Nat Neurosci, 15:1729–1735, 2012

### Abstract

In theoretical accounts of perceptual decision-making, a decision variable integrates noisy sensory evidence and determines action through a boundary-crossing criterion. Signals bearing these very properties have been characterized in single neurons in monkeys, but have yet to be directly identified in humans. Using a gradual target detection task, we isolated a freely evolving decision variable signal in human subjects that exhibited every aspect of the dynamics observed in its single-neuron counterparts. This signal could be continuously tracked in parallel with fully dissociable sensory encoding and motor preparation signals, and could be systematically perturbed mid-flight during decision formation. Furthermore, we found that the signal was completely domain general: it exhibited the same decision-predictive dynamics regardless of sensory modality and stimulus features and tracked cumulative evidence even in the absence of overt action. These findings provide a uniquely clear view on the neural determinants of simple perceptual decisions in humans.

### Review

The authors report EEG signals which may represent 1) instantaneous evidence and 2) accumulated evidence (decision variable) during perceptual decision making. The result promises a big leap for experiments in perceptual decision making with humans, because it is the first time that we can directly observe the decision process as it accumulates evidence with reasonable temporal resolution without sticking needles in participant’s brains. Furthermore, one of the found signals appears to be sensory and response modality independent, i.e., it appears to reflect the decision process alone – something that has not been clearly found in species other than humans, but let’s discuss the study in more detail.

The current belief about the perceptual decision making process is formalised in accumulation to bound models: When presented with a stimulus, the decision maker determines at each time point of the presentation the current amount of evidence for all possible alternatives. This estimate of “instantaneous evidence” is noisy, because of either the noise within the stimulus itself, or because of internal processing noise. Therefore, the decision maker does not immediately make a decision between alternatives, but accumulates evidence over time until the accumulated evidence for one of the alternatives reaches a threshold which is internally set by the decision maker itself and indicates a certain level of certainty, or response urgency. The alternative, for which the threshold was crossed, is the decision outcome and the time the threshold was crossed is the decision time (potentially including an additional delay). The authors argue that they have found signals in the EEG of humans which can be associated with the instantaneous and accumulated evidence variables of these kinds of models.

The paradigm used in this study was different from the perceptual decision making paradigm popular in monkeys (random dot stimuli). Here the authors used stimuli which did not move, but rather gradually changed their intensity or contrast: In the experiments with visual stimuli, participants were continuously viewing a flickering disk which from time to time gradually changed its contrast with the background (the contrast gradually went back to base level after 1.6s). So the participants had to decide whether they observe a contrast different from baseline at the current time. Note that this setup is slightly different from usual trial-based perceptual decision making experiments where a formally new trial begins after a participant’s response. The disk also had a pattern, but it’s unclear why the pattern was necessary. On the other hand, using the other stimulus properties seems reasonable: The flickering induced something like continuous evoked potentials in the EEG ensuring that something stimulus-related could be measured at all times, but the gradual change of contrast “successfully eliminated sensory-evoked deflections from the ERP trace” such that the more subtle accumulated evidence signals were not masked by large deflections solely due to stimulus onsets. In the experiments with sounds, equivalent stimulus changes were implemented by either gradually changing the volume of a presented, envelope-modulated tone or its frequency.

The authors report 4 EEG signals related to perceptual decision making. They argue that the occipital steady-state visual-evoked potential (SSVEP) indicated the estimated instantaneous evidence when visual stimuli were used, because its trajectories directly reflected the changes in constrast. For auditory stimuli, the authors found a corresponding steady-state auditory-evoked potential (SSAEP) which was located at more central EEG electrodes and at 40Hz instead of 20Hz (SSVEP). Further, the authors argue that a left-hemisphere beta (LHB, 22-30Hz) and a centro-parietal potential (CPP, direct electrode measurements) could be interpreted as evidence accumulation signals, because the time of their peaks tightly predicted reaction times and their time courses were better predicted by the cumulative SSVEP instead of the original SSVEP. LHB and CPP also (roughly) showed the expected dependency on whether the participant correctly identified the target, or missed it (lower signals for misses). Furthermore, they reacted expectedly, when contrast varied in more complex ways than just a linear decrease (decrease followed by short increase followed by decrease). CPP was different from LHB by also showing the expected changes when the task did not require an overt response at target detection time whereas LHB showed no relation to the present evidence in this task indicating that it may have something to do with motor preparation of the response while CPP is a more abstract decision signal. Additionally, the CPP showed the characteristic changes measured with visual stimuli also with auditory stimuli and it depended on attentional focus: In one experimental condition the task of the participants was altered (‘detect a transient size change of a central fixation square’), but the original disk stimulus was still presented including the gradual contrast changes. In this ‘non-attend’ condition the SSVEP decreased with contrast as before, but the CPP showed no response reinforcing the idea that the CPP is an abstract decision signal. On a final note, the authors speculate that the CPP could be equal to the standard P300 signal, when transient stimuli need to be detected instead of gradual stimulus changes. This connection, if true, would be a nice functional explanation of the P300.

Open Questions

Despite the generally intriguing results presented in the paper a few questions remain. These predominantly regard details.

1) omission of data

In Figs. 2 and 3 the SSVEP is not shown anymore, presumably because of space restrictions. Similarly, the LHB is not presented in Fig. 4. I can believe that the SSVEP behaved expectedly in the different conditions of Figs. 2 and 3 such that not much information would have been added by providing the plots, but it would at least be interesting to know whether the accumulated SSVEP still predicted the LHB and CCP better than the original SSVEP in these conditions. Likewise, the authors do not report the equivalent analysis for the SSAEP in the auditory conditions. Regarding the omission of the LHB in Fig. 4, I’m not so certain about the behaviour of the LHB in the auditory conditions. It seems possible that the LHB shows different behaviour with different modalities. There is no mention of this in the text, though.

2) Is there a common threshold level?

The authors argue that the LHB and CCP reached a common threshold level just before response initiation (a prediction of accumulation to bound models, Fig. 1c), but the used test does not entirely convince me: They compared the variance just before response initiation with the variance of measurements across different time points (they randomly assigned the RT of one trial to another trial and computed variance of measurements at the shuffled time points). For a strongly varying function of time, it is no surprise that the measurements at a consistent time point vary less than the measurements made across many different time points as long as the measurement noise is small enough. Based on this argument, it is strange that they did not find a significant difference for the SSVEP which also varies strongly across time (this fits into their interpretation, though), but this lack of difference could be explained by larger measurement noise associated with the SSVEP.

Furthermore, the authors report themselves that they found a significant difference between the size of CPP peaks around decision time for varying contrast levels (Fig. 2c). Especially, the CPP peak for false alarms (no contrast change, but participant response) was lower than the other peaks. If the CPP really is the decision variable predicted by the models, then these differences should not have occurred. So where do they come from? The authors provide arguments that I cannot follow without further explanations.

3) timing of peaks

It appears that the mean reaction time precedes the peaks of the mean signals slightly. The effect is particularly clear in Fig. 3b (CPP), Fig. 4d (CPP) and Fig. 5a, but is also slightly visible in the averages centred at the time of response in Figs. 1c and 2c. Presuming a delay from internal decision time to actual response, the time of the peak of the decision variable should precede the reaction time, especially when reaction time is measured from button presses (here) compared to saccade initiation (typical monkey experiments). So why does it here appear to be the other way round?

4) variance of SSVEP baseline

The SSVEP in Fig. 4a is in a different range (1.0-1.3) than the SSVEP in Fig. 4d (1.7-2.5) even though the two plots should each contain a time course for the same experimental condition. Where does the difference come from?

5) multiple alternatives

The CPP, as described by the authors, is a single, global signal of a decision variable. If the decision problem is composed of only two decision alternatives, a single decision variable is indeed sufficient for decision making, but if more alternatives are considered, several evidence accumulating variables are needed. What would the CPP then signal? One of the decision variables? The total amount of certainty of the upcoming decision?

Conclusion

I do like the results in the paper. If they hold up, the CPP may provide a high temporal resolution window into the decision processes of humans. As a result, it may allow us to investigate decision processes for more complex situations than those which animals can master, but maybe it’s only a signal for the simple, perceptual decisions investigated here. Based on the above open questions I also guess that the reported signals were noisier than the plots make us belief and the correspondence of the CPP with theoretical decision variables should be further examined.