## Normative evidence accumulation in unpredictable environments.

Glaze, C. M., Kable, J. W., and Gold, J. I.
Elife, 4, 2015

### Abstract

In our dynamic world, decisions about noisy stimuli can require temporal accumulation of evidence to identify steady signals; differentiation to detect unpredictable changes in those signals; or both. Normative models can account for learning in these environments but have not yet been applied to faster decision processes. We present a novel, normative formulation of adaptive learning models that forms decisions by acting as a leaky accumulator with non-absorbing bounds. These dynamics, derived for both discrete and continuous cases, depend on the expected rate of change of the statistics of the evidence and balance signal identification and change detection. We found that, for two different tasks, human subjects learned these expectations, albeit imperfectly, then used them to make decisions in accordance with the normative model. The results represent a unified, empirically supported account of decision-making in unpredictable environments that provides new insights into the expectation-driven dynamics of the underlying neural signals.

### Review

The authors suggest a model of sequential information processing that is aware of possible switches in the underlying source of information. They further show that the model fits responses of people in two perceptual decision making tasks and consequently argue that behaviour, which was previously considered to be suboptimal, may follow the normative, i.e., optimal, mechanism of the model. This mechanism postulates that typical evidence accumulation mechanisms in perceptual decision making are altered by the expected switch rate of the stimulus. Specifically, evidence accumulation becomes more leaky and a non-absorbing bound becomes lower when the expected switch rate increases. The paper is generally well-written (although there are some convoluted bits in the results section) and convincing. I was a bit surprised, though, that only choices, but not their timing is considered in the analysis with the model. In the following I’ll go through some more details of the model and discuss limitations of the presented models and their relation to other models in the field, but first I describe the experiments reported in the paper.

The paper reports two experiments. In the first (triangles task) people saw two triangles on the screen and had to judge whether a single dot was more likely to originate from the one triangle or the other. There was one dot and corresponding response per trial. In each trial the position of the dot was redrawn from a Gaussian distribution centred around one of the two triangles. There were also change point trials in which the triangle from which the dot was drawn switched (and then remained the same until the next change point). The authors analysed the proportion correct in relation to whether a trial was a change point. Trials were grouped into blocks which were defined by constant rate of switches (hazard rate) in the true originating triangle. In the second experiment (dots-reversal task), a random dot stimulus repeatedly switched (reversed) direction within a trial. In each trial people had to tell in which direction the dots moved before they vanished. The authors analysed the proportion correct in relation to the time between the last switch and the end of stimulus presentation. There were no blocks. Each trial had one of two hazard rates and one of two difficulty levels. The two difficulty levels were determined for each subject individually such that the more difficult one lead to correct identification of motion direction of a 500ms long stimulus in 65% of cases.

The authors present two normative models, one discrete and one continuous, which they apply across and within trial in the triangles and dots-reversal tasks, respectively. The discrete model is a simple hidden Markov model in which the hidden state can take one of two values and there is a common transition probability between these two values which they call hazard ‘rate’ (H). Observations were implicitly assumed Gaussian. They only enter during fitting as log-likelihood ratios in the form $$\beta*x_n$$ where beta is a scaling relating to the internal / sensory uncertainty associated with the generative model of observations and $$x_n$$ is the observed dot position (x-coordinate) in the triangles task. In methods, the authors derive the update equation for the log posterior odds ($$L_n$$) of the hidden state values given in Eqs. (1) and (2).

The continuous model is based on a Markov jump process with two states which is the continuous equivalent of the hidden Markov model above. Using Ito-calculus the authors again derive an update equation for the log posterior odds of the two states (Eq. 4), but during fitting they actually approximate Eq. (4) with the discrete Eq. (1), because it is supposedly the most efficient discrete-time approximation of Eq. (4) (no explanation for why this is the case was given). They just replace the log-likelihood ratio placeholder (LLR) with a coherence-dependent term applicable to the random dot motion stimulus. Notably, in contrast to standard drift-diffusion modelling of random dot motion tasks, the authors used coherence-dependent noise. I’d be interested in the reason for this choice.

There is an apparent fundamental difference between the discrete and continuous models which can be seen in Fig. 1 B vs C. In the discrete model, for H>0.5, the log posterior odds may actually switch sign from one observation to the next whereas this cannot happen in the continuous model. Conceptually, this means that the log posterior odds in the discrete model, when the LLR is 0, i.e., when there is no evidence in either direction, would oscillate between decreasing positive and increasing negative values until converging to 0. This oscillation can be seen in Fig. 2G, red line for |LLR|>0. In the continuous model such an oscillation cannot happen, because the infinitely many, tiny time steps allow the model to converge to 0 before switching the sign. Another way to see this is through the discrete hazard ‘rate’ H which is the probability of a sign reversal within one time step of size dt. When you want to decrease dt in the model, but want to maintain a given rate of sign reversals in, e.g., 1 second, H would also have to decrease. Consequently, when dt approaches 0, the probability of a sign reversal approaches 0, too, which means that H is a useless parameter in continuous time which, in turn, is the reason why it is replaced by a real rate parameter ($$\lambda$$) representing the expected number of reversals per second. In conclusion, the fundamental difference between discrete and continuous models is only an apparent one. They are very similar models, just expressed in different resolutions of time. In that sense it would have perhaps been better to present results in the paper consistently in terms of a real hazard rate ($$\lambda$$) which could be obtained in the triangles task by dividing H by the average duration of a trial in seconds. Notice that the discrete model represents all hazard rates $$\lambda>1/dt$$ as H=1, i.e., it cannot represent hazard rates which would lead to more than 1 expected sign reversal per $$dt$$. There may be more subtle differences between the models when the exact distributions of sign reversals are considered instead of only the expected rates.

Using first order approximations of the two models the authors identify two components in the dynamics of the log posterior odds L: a leak and a bias. [Side remark: there is a small sign mistake in the definition of leak k of the continuous model in the Methods section.] Both depend on hazard rate and the authors show that the leak dominates the dynamics for small L whereas the bias dominates for large L. I find this denomination a bit misleading, because both, leak and bias, effectively result in a leak of log-posterior odds L by reducing L in every time step (cf. Fig. 1B,C). The change from a multiplicative leak to one based on a bias just means that the effective amount of leak in L increases nonlinearly with L as the bias takes over.

To test whether this special form of leak underlies decision making the authors compared the full model to two versions which only had a multiplicative leak, or one based on bias. In the former the leak stayed constant for increasing L, i.e., $$L’ = \gamma*L$$. In the latter there was perfect accumulation without leak up to the bias and then a bias-based leak which corresponds to a multiplicative leak where the leak rate increased with L such that $$L’ = \gamma(L)*L$$ with $$\gamma(L) = bias / L$$. The authors report evidence that in both tasks both alternative models do not describe choice behaviour as well as the full, normative model. In Fig. 9 they provide a reason by estimating the effective leak rate in the data and the models in dependence on the strength of sensory evidence (coherence in the dots reversal task). They do this by fitting the model with multiplicative leak separately to trials with low and high coherence (fitting to choices in the data or predicted by the different fitted models). In both data and normative model the effective leak rates depended on coherence. This dependence arises, because high sensory evidence leads to large values of L and I have argued above that larger L has larger effective leak rate due to the bias. It is, therefore, not surprising that the alternative model with multiplicative leak shows no dependence of effective leak on coherence. But it is also not surprising that the alternative model with bias-based leak has a larger dependence of effective leak on coherence than the data, because this model jumps from no leak to very large leak when coherence jumps from low to high. The full, normative model lies in between, because it smoothly transitions between the two alternative models.

Why is there a leak in the first place? Other people have found no evidence for a leak in evidence accumulation (eg. Brunton et al., 2013). The leak results from the possibility of a switch of the source of the observations, i.e., a switch of the underlying true stimulus. Without any information, i.e., without observations the possibility of a switch means that you should become more uncertain about the stimulus as time passes. The larger the hazard rate, i.e., the larger the probability of a switch within some time window, the faster you should become uncertain about the current stimulus. For a log posterior odds of L=0 uncertainty is at its maximum (both stimuli have equal posterior probability). This is another reason why discrete hazard ‘rates’ H>0.5 which lead to sign reversals in L do not make much sense. The absence of evidence for one stimulus should not lead to evidence for the other stimulus. Anyway, as the hazard rate goes to 0 the leak will go to 0 such that in experiments where usually no switches in stimulus occur subjects should not exhibit a leak which explains why we often find no evidence for leaks in typical perceptual decision making experiments. This does not mean that there is no leak, though. Especially, the authors report here that hazard rates estimated from behaviour of subjects (subjective) tended to be a bit higher than the ones used to generate the stimuli (objective), when the objective hazard rates were very low and the other way around for high objective hazard rates. This indicates that people have some prior expectations towards intermediate hazard rates that biased their estimates of hazard rates in the experiment.

The discussed forms of leak implement a property of the model that the authors called a ‘non-absorbing bound’. I find this wording also a bit misleading, because ‘bound’ was usually used to indicate a threshold in drift diffusion models which, when reached, would trigger a response. The bound here triggers nothing. Rather, it represents an asymptote of the average log posterior odds. Thus, it’s not an absolute bound, but it’s often passed due to variance in the momentary sensory evidence (LLR). I can also not follow the authors when they write: “The stabilizing boundary is also in contrast to the asymptote in leaky accumulation, which increases linearly with the strength of evidence”. Based on the dynamics of L discussed above the ‘bound’ here should exhibit exactly the described behaviour of an asymptote in leaky accumulation. The strength of evidence is reflected in the magnitude of LLR which is added to the intrinsic dynamics of the log posterior odds L. The non-absorbing bound, therefore, should be given by bias + average of LLR for the current stimulus. The bound, thus, should rise linearly with the strength of evidence (LLR).

Fitting of the discrete and continuous models was done by maximising the likelihood of the models (in some fits with many parameters, priors over parameters were used to regularise the optimisation). The likelihood in the discrete models was Gaussian with mean equal to the log posterior odds ($$L_n$$) computed from the actual dot positions $$x_n$$. The variance of the Gaussian likelihood was fitted to the data as a free parameter. In the continuous model the likelihood was numerically approximated by simulating the discretised evolution of the probabilities that the log posterior odds take on particular values. This is very similar to the approach used by Brunton2013. The distribution of the log posterior odds $$L_n$$ was considered here, because the stream of sensory observations $$x(t)$$ was unknown and therefore had to enter as a random variable while in the triangles task $$x(t)=x_n$$ was set to the known x-coordinates of the presented dots.

The authors argued that the fits of behaviour were good, but at least for the dots reversal task Fig. 8 suggests otherwise. For example, Fig. 8G shows that 6 out of 12 subjects (there were supposed to be 13, but I can only see 12 in the plots) made 100% errors in trials with the low hazard rate of 0.1Hz and low coherence where the last switch in stimulus was very recent (maximally 300ms before the end of stimulus presentation). The best fitting model, however, predicted error rates of at most 90% in these conditions. Furthermore, there is a significant difference in choice errors between the low and high hazard rate for large times after the last switch in stimulus (Fig. 8A, more errors for high hazard rate) which was not predicted by the fitted normative model. Despite these differences the fitted normative model seems to capture the overall patterns in the data.

#### Conclusion

The authors present an interesting normative model in discrete and continuous time that extends previous models of evidence accumulation to situations in which switches in the presented stimulus can be expected. In light of this model, a leak in evidence accumulation reflects a tendency to increase uncertainty about the stimulus due to a potentially upcoming switch in the stimulus. The model provides a mathematical relation between the precise type of leak and the expected switch (hazard) rate of the stimulus. In particular, and in contrast to previous models, the leak in the present model depends nonlinearly on the accumulated evidence. As the authors discuss, the presented normative model potentially unifies decision making processes observed in different situations characterised by different stabilities of the underlying stimuli. I had the impression that the authors were very thorough in their analysis. However, some deviations of model and data apparent in Fig. 8 suggest that either the model itself, or the fitting procedure may be improved such that the model better fits people’s behaviour in the dots-reversal task. It was anyway surprising to me that subjects only had to make a single response per trial in that task. This feels like a big waste of potential choice data when I consider that each trial was 5-10s long and contained several stimulus switches (reversals).

## A test of Bayesian observer models of processing in the Eriksen flanker task.

White, C. N., Brown, S., and Ratcliff, R.
J Exp Psychol Hum Percept Perform, 38:489–497, 2012

### Abstract

Two Bayesian observer models were recently proposed to account for data from the Eriksen flanker task, in which flanking items interfere with processing of a central target. One model assumes that interference stems from a perceptual bias to process nearby items as if they are compatible, and the other assumes that the interference is due to spatial uncertainty in the visual system (Yu, Dayan, & Cohen, 2009). Both models were shown to produce one aspect of the empirical data, the below-chance dip in accuracy for fast responses to incongruent trials. However, the models had not been fit to the full set of behavioral data from the flanker task, nor had they been contrasted with other models. The present study demonstrates that neither model can account for the behavioral data as well as a comparison spotlight-diffusion model. Both observer models missed key aspects of the data, challenging the validity of their underlying mechanisms. Analysis of a new hybrid model showed that the shortcomings of the observer models stem from their assumptions about visual processing, not the use of a Bayesian decision process.

### Review

This is a response to Yu2009 in which the authors show that Yu et al.'s main Bayesian models cannot account for the full data of an Eriksen flanker task. In particular, Yu et al.'s models predict a far too high overall error rate with the suggested parameter settings that reproduce the inital drop of accuracy below chance level for very fast responses. The argument put forward by White et al. is that the mechanisms used in Yu et al.'s models to overcome initial, flanker-induced biases is too slow, i.e., the probabilistic evidence accumulation implemented by the models is influenced by the flankers for too long. White et al's shrinking spotlight models do not have such a problem, mostly because the speed with which flankers loose influence is fitted to the data. The argument seems compelling, but I would like to understand better why it takes so long in the Bayesian model to overcome flanker influence and whether there are other ways of speeding this up than the one suggested by White et al..

## Dynamics of attentional selection under conflict: toward a rational Bayesian account.

Yu, A. J., Dayan, P., and Cohen, J. D.
J Exp Psychol Hum Percept Perform, 35:700–717, 2009

### Abstract

The brain exhibits remarkable facility in exerting attentional control in most circumstances, but it also suffers apparent limitations in others. The authors' goal is to construct a rational account for why attentional control appears suboptimal under conditions of conflict and what this implies about the underlying computational principles. The formal framework used is based on Bayesian probability theory, which provides a convenient language for delineating the rationale and dynamics of attentional selection. The authors illustrate these issues with the Eriksen flanker task, a classical paradigm that explores the effects of competing sensory inputs on response tendencies. The authors show how 2 distinctly formulated models, based on compatibility bias and spatial uncertainty principles, can account for the behavioral data. They also suggest novel experiments that may differentiate these models. In addition, they elaborate a simplified model that approximates optimal computation and may map more directly onto the underlying neural machinery. This approximate model uses conflict monitoring, putatively mediated by the anterior cingulate cortex, as a proxy for compatibility representation. The authors also consider how this conflict information might be disseminated and used to control processing.

### Review

They suggest two simple, Bayesian perceptual models based on evidence integration for the (deadlined) Eriksen task. Their focus is on attentional mechanisms that can explain why particpants' responses are below chance for very fast responses. These mechanisms are based on a prior on compatibility (that flankers are compatible with the relevant centre stimulus) and spatial uncertainty (flankers influence processing of centre stimulus on a low, sensory level). The core inference is the same and replicates the basic mechanism you would expect for any perceptual decision making task. They don't fit behaviour, but rather show average trajectories from model simulations with hand-tuned parameters. They further suggest a third model inspired by previous work on conflict monitoring and cognitive control which supposedly is more likely to be implemented in the brain, because instead of having to consider (and compute with) all possible stimuli in the environment, it uses a conflict monitoring mechanism to switch between types of stimuli that are considered.

## Neural correlates of perceptual decision making before, during, and after decision commitment in monkey frontal eye field.

Ding, L. and Gold, J. I.
Cereb Cortex, 22:1052–1067, 2012

### Abstract

Perceptual decision making requires a complex set of computations to implement, evaluate, and adjust the conversion of sensory input into a categorical judgment. Little is known about how the specific underlying computations are distributed across and within different brain regions. Using a reaction-time (RT) motion direction-discrimination task, we show that a unique combination of decision-related signals is represented in monkey frontal eye field (FEF). Some responses were modulated by choice, motion strength, and RT, consistent with a temporal accumulation of sensory evidence. These responses converged to a threshold level prior to behavioral responses, reflecting decision commitment. Other responses continued to be modulated by motion strength even after decision commitment, possibly providing a memory trace to help evaluate and adjust the decision process with respect to rewarding outcomes. Both response types were encoded by FEF neurons with both narrow- and broad-spike waveforms, presumably corresponding to inhibitory interneurons and excitatory pyramidal neurons, respectively, and with diverse visual, visuomotor, and motor properties, albeit with different frequencies. Thus, neurons throughout FEF appear to make multiple contributions to decision making that only partially overlap with contributions from other brain regions. These results help to constrain how networks of brain regions interact to generate perceptual decisions.

### Review

This paper puts some perspective in the usually communicated statement that LIP neurons are responsible for perceptual decision making in monkeys who perform a reaction time motion discrimination task. Especially, the authors report on neurons in frontal eye field (FEF) that also show typical accumulation-to-bound responses. Furthermore, at least as many neurons in FEF exhibited activity that was correlated with motion coherence and choice during and after the saccade indicating a choice and extinguishing the stimulus, i.e., the activity of these neurons appeared to accumulate evidence, but seemed to ignore the supposed bound and maintained a representation of the stimulus after it had gone. In the discussion the authors also point to other studies which found activity that can be interpreted in terms of evidence accumulation. Corresponding neurons have been found in LIP, FEF, superior colliculus (SC) and caudate nucleus of which neurons in LIP and SC may be mostly governed by a bound. From the reported and reviewed results it becomes clear that, although accumulation-to-bound may be an important component of perceptual decision making, it is not sufficient to explain the wide variety of decision-related neuronal activity in the brain. In particular, it is unclear how neurons from the mentioned brain regions interact and what their different roles in perceptual decision making are.

## Universality in numerical computations with random data.

Deift, P. A., Menon, G., Olver, S., and Trogdon, T.
Proc Natl Acad Sci U S A, 111:14973–14978, 2014

### Abstract

The authors present evidence for universality in numerical computations with random data. Given a (possibly stochastic) numerical algorithm with random input data, the time (or number of iterations) to convergence (within a given tolerance) is a random variable, called the halting time. Two-component universality is observed for the fluctuations of the halting time-i.e., the histogram for the halting times, centered by the sample average and scaled by the sample variance, collapses to a universal curve, independent of the input data distribution, as the dimension increases. Thus, up to two components-the sample average and the sample variance-the statistics for the halting time are universally prescribed. The case studies include six standard numerical algorithms as well as a model of neural computation and decision-making. A link to relevant software is provided for readers who would like to do computations of their own.

### Review

The author’s show that normalised halting / stopping times follow common distributions. Stopping times are assumed to be generated by an algorithm A from a random ensemble E where E does not represent the particular sample from which stopping times are generated, but the theoretical distribution of that sample. Normalisation is standard normalisation: subtract mean and divide by standard deviation of a sample of stopping times. The resulting distribution is the same across different ensembles E, but differs across algorithms A. That distributions are the same the authors call (two-component) universality without explanation why they call it like that. There is also no reference to a concept of universality. Perhaps it’s something common in physics. Perhaps it’s explained in their first reference. Reference numbers are shifted by one, by the way.

How is that interesting? I’m not sure. The authors give an example with a model of reaction times. This is a kind of Ising model where decisions are made once a sufficient number of binary states have switched to one of the states. States flip with a certain probability as determined by a given function of the current state of the whole Ising model. When different such functions were considered, corresponding to different ensembles E, normalised reaction times followed the same distribution again. However, the distribution of normalised reaction times differed for different total numbers of binary states in the Ising model. These results suggest that normalised reaction times should follow the same distribution over subjects, but only if subjects differ maximally by the randomness on which their decisions are based. If subjects use slightly different algorithms for making decisions, you would expect differences in the distribution of normalised reaction times. I guess it would be cool to infer that subjects use the same (or a different) algorithm purely from their reaction time distributions, but what would be an appropriate test for this and what would be its power?

## Effects of cortical microstimulation on confidence in a perceptual decision.

Fetsch, C. R., Kiani, R., Newsome, W. T., and Shadlen, M. N.
Neuron, 83:797–804, 2014

### Abstract

Decisions are often associated with a degree of certainty, or confidence-an estimate of the probability that the chosen option will be correct. Recent neurophysiological results suggest that the central processing of evidence leading to a perceptual decision also establishes a level of confidence. Here we provide a causal test of this hypothesis by electrically stimulating areas of the visual cortex involved in motion perception. Monkeys discriminated the direction of motion in a noisy display and were sometimes allowed to opt out of the direction choice if their confidence was low. Microstimulation did not reduce overall confidence in the decision but instead altered confidence in a manner that mimicked a change in visual motion, plus a small increase in sensory noise. The results suggest that the same sensory neural signals support choice, reaction time, and confidence in a decision and that artificial manipulation of these signals preserves the quantitative relationship between accumulated evidence and confidence.

### Review

The paper provides verification of beliefs asserted in Kiani2009: Confidence is directly linked to accumulated evidence as represented in monkey area LIP during a random dot motion discrimination task. The authors use exactly the same task, but now stimulate patches of MT/MST neurons instead of recording single LIP neurons and resort to analysing behavioural data only. They find that small microstimulation of functionally well-defined neurons, that signal a particular motion direction, affects decisions in the same way as manipulating the motion information in the stimulus directly. This was expected, because it has been shown before that stimulating MT neurons influences decisions in that way. New here is that the effect of stimulation on confidence judgements was evaluated at the same time. The rather humdrum result: confidence judgements are also affected in the same way. The authors argue that this didn’t have to be, because confidence judgements are thought to be a metacognitive process that may be influenced by other high-level cognitive functions such as related to motivation. Then again, isn’t decision making thought to be a high-level cognitive function that is clearly influenced by motivation?

Anyway, there was one small effect particular to stimulation that did not occur in the control experiment where the stimulus itself was manipulated: There was a slight decrease in the overall proportion of sure-bet choices (presumably indicating low confidence) with stimulation suggesting that monkeys were more confident when stimulated. The authors explain this with larger noise (diffusion) in a simple drift-diffusion model. Counterintuitively, the larger accumulation noise increases the probability of moving away from the initial value and out of the low-confidence region. The mechanism makes sense, but I would rather explain it within an equivalent Bayesian model in which MT neurons represent noisy observations that are transformed into noisy pieces of evidence which are accumulated in LIP. Stimulation increases the noise on the observations which in turn increases accumulation noise in the equivalent drift-diffusion model (see Bitzer et al., 2014).

In drift-diffusion models drift, diffusion and threshold are mutually redundant in that one of them needs to be fixed when fitting the model to choices and reaction times. The authors here let all of them vary simultaneously which indicates that the parameters can be discriminated based on confidence judgements even when no reaction time is taken into account. This should be followed up. It is also interesting to think about how the postulated tight link between the ‘decision variable’ and the experienced confidence can be consolidated in a reaction time task where supposedly all decisions are made at the same threshold value. Notice that the confidence of a decision in their framework depends on the state of the diffusion (most likely one of the two boundaries) and the time of the decision: Assuming fixed noise, smaller decision times should translate into larger confidence, because you assume that this is due to a larger drift. Therefore, you should see variability of confidence judgements in a reaction time task that is strongly correlated with reaction times.

## Decision-related activity in sensory neurons reflects more than a neuron's causal effect.

Nienborg, H. and Cumming, B. G.
Nature, 459:89–92, 2009

### Abstract

During perceptual decisions, the activity of sensory neurons correlates with a subject’s percept, even when the physical stimulus is identical. The origin of this correlation is unknown. Current theory proposes a causal effect of noise in sensory neurons on perceptual decisions, but the correlation could result from different brain states associated with the perceptual choice (a top-down explanation). These two schemes have very different implications for the role of sensory neurons in forming decisions. Here we use white-noise analysis to measure tuning functions of V2 neurons associated with choice and simultaneously measure how the variation in the stimulus affects the subjects’ (two macaques) perceptual decisions. In causal models, stronger effects of the stimulus upon decisions, mediated by sensory neurons, are associated with stronger choice-related activity. However, we find that over the time course of the trial these measures change in different directions-at odds with causal models. An analysis of the effect of reward size also supports this conclusion. Finally, we find that choice is associated with changes in neuronal gain that are incompatible with causal models. All three results are readily explained if choice is associated with changes in neuronal gain caused by top-down phenomena that closely resemble attention. We conclude that top-down processes contribute to choice-related activity. Thus, even forming simple sensory decisions involves complex interactions between cognitive processes and sensory neurons.

### Review

They investigated the source of the choice probability of early sensory neurons. Choice probability quantifies the difference in firing rate distributions separated by the behavioural response of the subject. The less overlap between the firing rate distributions for one response and its alternative (in two-choice tasks), the greater the choice probability. Importantly, they restricted their analysis to trials in which the stimulus was effectively random. In random dot motion experiments this corresponds to 0% coherent motion, but here they used a disparity discrimination task and looked at disparity selective neurons in macaque area V2. The mean contribution from the stimulus, therefore, should have been 0. Yet, they found that choice probability was above 0.5 indicating that the firing of the neurons still could predict the final response, but why? They consider two possibilities: 1) the particular noise in firing rates of sensory neurons causes, at least partially, the final choice. 2) The firing rate of sensory neurons reflects choice-related effects induced by top-down influences from more decision-related areas.

Note that the choice probability they use is somewhat corrected for influences from the stimulus by considering the firing rate of a neuron in response to a particular disparity, but without taking choices into account. This correction reduced choice probabilities a bit. Nevertheless, they remained significantly above 0.5. This result indicates that the firing rate distributions of the recorded neurons were only little affected by which disparities were shown in individual frames when these distributions are defined depending on the final choice. I don’t find this surprising, because there was no consistent stimulus to detect from the random disparities and the behavioural choices were effectively random.

Yet, the particular disparities presented in individual trials had an influence on the final choice. They used psychophysical reverse correlation to determine this. The analysis suggests that the very first frames had a very small effect which is followed by a steep rise in influence of frames at the beginning of a trial (until about 200ms) and then a steady decline. This result can mean different things depending on whether you believe that evidence accumulation stops once you have reached a threshold, or whether evidence accumulation continues until you are required to make a response. Shadlen is probably a proponent of the first proposition. Then, the decreasing influence of the stimulus on the choice just reflects the smaller number of trials in which the threshold hasn’t been reached, yet. Based on the second proposition, the result means that the weight of individual pieces of evidence during accumulation reduces as you come closer to the response. Currently, I can’t think of decisive evidence for either proposition, but it has been shown in perturbation experiments that stimulus perturbations close to a decision, late in a trial had smaller effects on final choices than perturbations early in a trial (Huk and Shadlen, 2005).

Back to the source of above chance-level choice probabilities. The authors argue, given the decreasing influence of the stimulus on the final choice and assuming that the influence of the stimulus on sensory neurons stays constant, that choice probabilities should also decrease towards the end of a trial. However, choice probabilities stay roughly constant after an initial rise. Consequently, they infer that the firing of the neurons must be influenced from other sources, apart from the stimulus, which are correlated with the choice. They consider two of these sources: i) Lateral, sensory neurons which could reflect the final decision better. ii) Higher, decision related areas which, for example, project a kind of bias onto the sensory neurons. The authors strongly prefer ii), also because they found that the firing of sensory neurons appears to be gain modulated when contrasting firing rates between final choices. In particular, firing rates showed a larger gain (steeper disparity tuning curve of neuron) when trials were considered which ended with the behavioural choice corresponding to the preferred dispartiy of the neuron. In other words, the output of a neuron was selectively increased, if that neuron preferred the disparity which was finally chosen. Equivalently, the output of a neuron was selectively decreased, if that neuron preferred a different disparity than the one which was finally chosen. This gain difference explains at least part of the difference in firing rate distributions which the choice probability measures.

They also show an interesting effect of reward size on the correlation between stimulus and final choice: Stimulus had larger influence on choice for larger reward. Again, if the choice probabilities were mainly driven by stimulus, bottom-up related effects and the stimulus had a larger influence on final choice in high reward trials, then choice probabilities should have been higher in high reward trials. The opposite was the case: choice probabilities were lower in high reward trials. The authors explain this using the previous bias hypothesis: The measured choice probabilities reflect something like an attentional gain or bias induced by higher-level decision-related areas. As the stimulus becomes more important, the bias looses influence. Hence, the choice probabilities reduce.

In summary, the authors present convincing evidence that already sensory neurons in early visual cortex (V2) receive top-down, decision-related influences. Compared with a previous paper (Nienborg and Cumming, 2006) the reported choice probabilities here were quite similar to those reported there, even though here only trials with complete random stimuli were considered. I would have guessed that choice probabilities would be considerably higher for trials with an actually presented stimulus. Why is there only a moderate difference? Perhaps there actually isn’t. My observation is only based on a brief look at the figures in the two papers.

## The Influence of Spatiotemporal Structure of Noisy Stimuli in Decision Making.

Insabato, A., Dempere-Marco, L., Pannunzi, M., Deco, G., and Romo, R.
PLoS Comput Biol, 10:e1003492, 2014

### Abstract

Decision making is a process of utmost importance in our daily lives, the study of which has been receiving notable attention for decades. Nevertheless, the neural mechanisms underlying decision making are still not fully understood. Computational modeling has revealed itself as a valuable asset to address some of the fundamental questions. Biophysically plausible models, in particular, are useful in bridging the different levels of description that experimental studies provide, from the neural spiking activity recorded at the cellular level to the performance reported at the behavioral level. In this article, we have reviewed some of the recent progress made in the understanding of the neural mechanisms that underlie decision making. We have performed a critical evaluation of the available results and address, from a computational perspective, aspects of both experimentation and modeling that so far have eluded comprehension. To guide the discussion, we have selected a central theme which revolves around the following question: how does the spatiotemporal structure of sensory stimuli affect the perceptual decision-making process? This question is a timely one as several issues that still remain unresolved stem from this central theme. These include: (i) the role of spatiotemporal input fluctuations in perceptual decision making, (ii) how to extend the current results and models derived from two-alternative choice studies to scenarios with multiple competing evidences, and (iii) to establish whether different types of spatiotemporal input fluctuations affect decision-making outcomes in distinctive ways. And although we have restricted our discussion mostly to visual decisions, our main conclusions are arguably generalizable; hence, their possible extension to other sensory modalities is one of the points in our discussion.

### Review

They review previous findings about perceptual decision making from a computational perspective, mostly related to attractor models of decision making. The focus here, however, is how the noisy stimulus influences the decision. They mostly restrict themselves to experiments with random dot motion, because these provided most relevant results for their discussion which mainly included three points: 1) specifics of decision input in decisions with multiple alternatives, 2) the relation of the activity of sensory neurons to decisions (cf. CP – choice probability) and 3) in what way sensory neurons reflect fluctuations of the particular stimulus. See also first paragraph of Final Remarks for summary, but note that I have made slightly different points. Their 3rd point derives from mine by applying mine to the specifics of the random dot motion stimuli. In particular, they suggest to investigate in how far different definitions of spatial noise in the random dot stimulus affect decisions differently.

With 2) they discuss the interesting finding that already the activity of sensory neurons can, to some extent, predict final decisions even when the evidence in the stimulus does not favour any decision alternative. So where does the variance in sensory neurons come from which eventually leads to a decision? Obviously, it could come from the stimulus itself. It has been found, however, that the ratio of variance to mean activity is the same when computed over trials with different stimuli compared to when computed over trials in which exactly the same stimulus with a particular realisation of noise was repeated. You would like to see a reduction of variance when the same stimulus is repeated, but it’s not there. I’m unsure, though, whether this is the correct interpretation of the variance-mean-ratio. I would have to check the original papers by Britten (Britten, 1993 and Britten, 1996). The seemingly constant variance of sensory neuron activity suggests that the particular noise realisation of a random dot stimulus does not affect decisions. Rather, the intrinsic activity of sensory neurons drives decisions in the case of no clear evidence. The authors argue that this is not a complete description of the situation, because it has also been found that you can see an effect of the particular stimulus on the variance of sensory neuron activity when considering small time windows instead of the whole trial. Unfortunately, the argument is mostly based on results presented in a SfN meeting abstracts in 2012. I wonder why there is no corresponding paper.

## Probabilistic reasoning by neurons.

Yang, T. and Shadlen, M. N.
Nature, 447:1075–1080, 2007

### Abstract

Our brains allow us to reason about alternatives and to make choices that are likely to pay off. Often there is no one correct answer, but instead one that is favoured simply because it is more likely to lead to reward. A variety of probabilistic classification tasks probe the covert strategies that humans use to decide among alternatives based on evidence that bears only probabilistically on outcome. Here we show that rhesus monkeys can also achieve such reasoning. We have trained two monkeys to choose between a pair of coloured targets after viewing four shapes, shown sequentially, that governed the probability that one of the targets would furnish reward. Monkeys learned to combine probabilistic information from the shape combinations. Moreover, neurons in the parietal cortex reveal the addition and subtraction of probabilistic quantities that underlie decision-making on this task.

### Review

The authors argue that the brain reasons probabilistically, because they find that single neuron responses (firing rates) correlate with a measure of probabilistic evidence derived from the probabilistic task setup. It is certainly true that the monkeys could learn the task (a variant of the weather prediction task) and I also find the evidence presented in the paper generally compelling, but the authors note themselves that similar correlations with firing rate may result from other quantitative measures with similar properties as the one considered here. May, for example, firing rates correlate similarly with a measure of expected value of a shape combination as derived from a reinforcement learning model?

What did they do in detail? They trained monkeys on a task in which they had to predict which of two targets will be rewarded based on a set of four shapes presented on the screen. Each shape contributed a certain weight to the probability of rewarding a target as defined by the experimenters. The monkeys had to learn these weights. Then they also had to learn (implicitly) how the weights of shapes are combined to produce the probability of reward. After about 130,000 trials the monkeys were good enough to be tested. The trick in the experiment was that the four shapes were not presented simultaneously, but appeared one after the other. The question was whether neurons in lateral intraparietal (LIP) area of the monkeys’ brains would represent the updated probabilities of reward after addition of each new shape within a trial. That the neurons would do that was hypothesised, because results from previous experiments suggested (see Gold & Shalden, 2007 for review) that neurons in LIP represent accumulated evidence in a perceptual decision making paradigm.

Now Shadlen seems convinced that these neurons do not directly represent the relevant probabilities, but rather represent the log likelihood ratio (logLR) of one choice option over the other (see, e.g., Gold & Shadlen, 2001 and Shadlen et al., 2008). Hence, these ‘posterior’ probabilities play no role in the paper. Instead all results are obtained for the logLR. Funnily the task is defined solely in terms of the posterior probability of reward for a particular combination of four shapes and the logLR needs to be computed from the posterior probabilities (Yang & Shadlen don’t lay out this detail in the paper or the supplementary information). I’m more open about the representation of posterior probabilities directly and I wondered how the correlation with logLR would look like, if the firing rates would respresent posterior probabilities. This is easy to simulate in Matlab (see Yang2007.m). Such a simulation shows that, as a function of logLR, the firing rate (representing posterior probabilities) should follow a sigmoid function. Compare this prediction to Figures 2c and 3b for epoch 4. Such a sigmoidal relationship derives from the boundedness of the posterior probabilities which is obviously reflected in firing rates of neurons as they cannot drop or rise indefinitely. So there could be simple reasons for the boundedness of firing rates other than that they represent probabilities, but in any case it appears unlikely that they represent unbounded log likelihood ratios.

## The Cost of Accumulating Evidence in Perceptual Decision Making.

Drugowitsch, J., Moreno-Bote, R., Churchland, A. K., Shadlen, M. N., and Pouget, A.
The Journal of Neuroscience, 32:3612–3628, 2012

### Abstract

Decision making often involves the accumulation of information over time, but acquiring information typically comes at a cost. Little is known about the cost incurred by animals and humans for acquiring additional information from sensory variables due, for instance, to attentional efforts. Through a novel integration of diffusion models and dynamic programming, we were able to estimate the cost of making additional observations per unit of time from two monkeys and six humans in a reaction time (RT) random-dot motion discrimination task. Surprisingly, we find that the cost is neither zero nor constant over time, but for the animals and humans features a brief period in which it is constant but increases thereafter. In addition, we show that our theory accurately matches the observed reaction time distributions for each stimulus condition, the time-dependent choice accuracy both conditional on stimulus strength and independent of it, and choice accuracy and mean reaction times as a function of stimulus strength. The theory also correctly predicts that urgency signals in the brain should be independent of the difficulty, or stimulus strength, at each trial.

### Review

The authors show equivalence between a probabilistic and a diffusion model of perceptual decision making and consequently explain experimentally observed behaviour in the random dot motion task in terms of varying bounds in the diffusion model which correspond to varying costs in the probabilistic model. Here, I discuss their model in detail and outline its limits. My main worry with the presented model is that it may be too powerful to have real explanatory power. Impatient readers may want to skip to the conclusion below.

Perceptual model

The presented model is tailored to the two-alternative, forced choice random dot motion task. The fundamental assumption for the model is that at each point in discrete time, or equivalently, for each successive time period in continuous time the perceptual process of the decision maker produces an independent sample of evidence whose mean, mu*dt, reflects the strength (coherence) and direction (only through sign of evidence) of random dot motion while its variance, sigma2, reflects the passage of time (sigma2 = dt, the time period between observations). This definition of input to the decision model as independent samples of motion strength in either one of two (unspecified) directions restricts the model to two decision alternatives. Consequently, the presented model does not apply to more alternatives, or dependent samples.

The model of noisy, momentary evidence corresponds to a Wiener process with drift which is exactly what standard (drift) diffusion models of perceptual decision making are where drift is equal to mu and diffusion is equal to sigma2. You could wonder why sigma2 is exactly equal to dt and not larger, or smaller, but this is controlled by setting the mean evidence mu to an appropriate level by allowing it to scale: mu = k*c, where k is an arbitrary scaling constant which is fit to data and c is the random dot coherence in the current trial. Therefore, by controlling k you essentially control the signal to noise ratio in the model of the experiment and you would get equivalent results, if you changed sigma2 while fixing mu = c. The difference between the two cases is purely conceptual: In the former case you assume that the neuronal population in MT signals, on average, a scaled motion strength where the scaling may be different for different subjects, but signal variance is the same over subjects while in the latter case you assume that the MT signal, on average, corresponds to motion strength directly, but MT signal variance varies across subjects. Personally, I prefer the latter.

The decision circuit in the author’s model takes the samples of momentary evidence as described above and computes a posterior belief over the two considered alternatives (motion directions). This posterior belief depends on the posterior probability distribution over mean motion strengths mu which is computed from the samples of momentary evidence taking a prior distribution over motion strengths into account. An important assumption in the computation of the posterior is that the decision maker (or decision circuit) has a perfect model of how the samples of momentary evidence are generated (a Gaussian with mean mu*dt and variance dt). If, for example, the decision maker would assume a slightly different variance, that would also explain differences in mean accuracy and decision times. The assumption of the perfect model, however, allows the authors to assert that the experimentally observed fraction of correct choices at a time t is equal to the internal belief of the decision maker (subject) that the chosen alternative is the correct one. This is important, because only with an estimate of this internal belief the authors can later infer the time-varying waiting costs for the subject (see below).

Anyway, under the given model the authors show that for a Gaussian prior you obtain a Gaussian posterior over motion strength mu (Eq. 4) and for a discrete prior you obtain a corresponding discrete posterior (Eq. 7). Importantly, the parameters of the posteriors can be formulated as functions of the current state x(t) of the sample-generating diffusion process and elapsed time t. Consequently, also the posterior belief over decision alternatives can be formulated as a one-to-one, i.e., invertible function of the diffusion state (and time t). By this connection, the authors have shown that, under an appropriate transformation, decisions based on the posterior belief are equivalent to decisions based on the (accumulated) diffusion state x(t) set in relation to elapsed time t.

In summary, the probabilistic perceptual decision model of the authors simply estimates the motion strength from the samples and then decides whether the estimate is positive or negative. Furthermore, this procedure is equivalent to accumulating the samples and deciding whether the accumulated state is very positive or very negative (as determined by hitting a bound). The described diffusion model has been used before to fit accuracies and mean reaction times of subjects, but apparently it was never quite good in fitting the full reaction time distribution (note that it lacks the extensions of the drift diffusion models suggested by Ratcliff, see, e.g., [1]). So here the authors extend the diffusion model by adding time-varying bounds which can be interpreted in the probabilistic model as a time-varying cost of waiting for more samples.

Time-varying bounds and costs

Intuitively, introducing a time-varying bound in a diffusion model introduces great flexibility in shaping the response accuracy and timing at any given time point. However, I currently do not have a good idea of just how flexible the model becomes. For example, if in discrete time changing the bound at each time step could independently modify the accuracy and reaction time distribution at this time step, the bound alone could explain the data. I don’t believe that this extreme case is true, but I would like to know how close you would come. In any case, it appears to be sensible to restrict how much the bound can vary to prevent overfitting of the data, or indeed to prevent making the other model parameters obsolete. In the present paper, the authors control the shape of the bound by using a function made of cosine basis functions. Although this restricts the bound to be a smooth function of time, it still allows considerable flexibility. The authors use two more approaches to control the flexibility of the bound. One is to constrain the bound to be the same for all coherences, meaning that it cannot be used to explain differences between coherences (experimental conditions). The other is to use Bayesian methods for fitting the data. On the one hand, this controls the bound by choosing particular priors. They do this by only considering parameter values in a restricted range, but I do not know how wide or narrow this range is in practice. On the other hand, the Bayesian approach leads to posterior distributions over parameters which means that subsequent analyses can take the uncertainty over parameters into account (see, e.g., the indicated uncertainty over the inferred bound in Fig. 5A). Although I remain with some last doubts about whether the bound was too flexible, I believe that this is not a big issue here.

It is, however, a different question whether the time-varying bound is a good explanation for the observed behaviour in contrast, e.g., to the extensions of the diffusion model introduced by Ratcliff (mostly trial-by-trial parameter variability). There, one might refer to the second, decision-related part of the presented model which considers the rewards and costs associated with decisions. In the Bayesian decision model presented in the paper the subject decides at each time step whether to select alternative 1, or alternative 2, or wait for more evidence in the next time step. This mechanism was already mentioned in [2]. Choosing an alternative will either lead to a reward (correct answer) or punishment (error), but waiting is also associated with a cost which may change throughout the trial. Deciding for the optimal course of action which maximises reward per unit time then is an average-reward reinforcement learning problem which the authors solve using dynamic programming. For a particular setting of reward, punishment and waiting costs this can be translated into an equivalent time-varying bound. More importantly, the procedure can be reversed such that the time-varying cost can be inferred from a bound that had been fitted to data. Apart from the bound, however, the estimate of the cost also depends on the reward/punishment setting and on an estimate of choice accuracy at each time step. Note that the latter differs considerably from the overall accuracy which is usually used to fit diffusion models and requires more data, especially when the error rate is low.

The Bayesian decision model, therefore, allows to translate the time-varying bound to a time-varying cost which then provides an explanation of the particular shape of the reaction time distribution (and accuracy) in terms of the intrinsic motivation (negative cost) of the subject to wait for more evidence. Notice that this intrinsic motivation is really just a value describing how much somebody (dis-)likes to wait and it cannot be interpreted in terms of trying to be better in the task anymore, because all these components have been taken care of by other parts of the decision model. So what does it mean when a subject likes to wait for new evidence just for the sake of it (cf. dip in cost at beginning of trial in human data in Fig. 8)? I don’t know.

Collapsing bounds as found from behavioural data in this paper have been associated with an urgency signal in neural data which drives firing rates of all decision neurons towards a bound at the end of a trial irrespective of the input / evidence. This has been interpreted as a response of the subjects to the approaching deadline (end of trial) that they do not want to miss. The explanation in terms of a waiting cost which rises towards the end of a trial suggests that subjects just have a built-in desire to make (potentially arbitrary) choices before a deadline. To me, this is rather unintuitive. If you’re not punished for making a wrong choice (blue lines in Figs. 7 and 8, but note that there was a small time-punishment in the human experiment) shouldn’t it be always beneficial to make a choice before the deadline, because you trade uncertain reward against certain no reward? This would already be able to explain the urgency signal without consideration of a waiting cost. So why do we see one anyway? It may just all depend on the particular setting of reward and punishment for correct choices and errors, respectively. The authors present different inferred waiting costs with varying amounts of punishment and argue that the results are qualitatively equal, but the three different values of punishment they present hardly exhaust the range of values that could be assumed. Also, they did not vary the amount of reward given for correct choices, but it is likely that only the difference between reward and punishment determines the behaviour of the model such that it doesn’t matter whether you change reward or punishment to explore model predictions.

Conclusion

The main contribution of the paper is to show that accuracy and reaction time distribution can be explained by a time-varying bound in a simple diffusion model in which the drift scales linearly with stimulus intensity (coherence in random dot motion). I tried to point out that this result may not be surprising depending on how much flexibility a time-varying bound adds to the model. Additionally, the authors present a connection between diffusion and Bayesian models of perceptual decision making which allows them to reinterpret the time-varying bounds in terms of the subjective cost of waiting for more evidence to arrive. The authors argue that this cost increases towards the end of a trial, but for two reasons I’m not entirely convinced: 1) Conceptually, it is worth considering the origin of a possible waiting cost. It could correspond to the energetic cost of keeping the inference machinery running and the attention on the task, but there is no reason why this should increase towards a deadline. 2) I’m not convinced by the presented results that the inferred increase of cost towards a deadline is qualitatively independent of the reward/punishment setting. A greater range of punishments should have been tested. Note that you cannot infer the rewards for decisions and the time-varying waiting cost at the same time from the behavioural data. So this issue cannot be settled without some new experiments which measure rewards or costs more directly. Finally, I miss an overview of fitted parameter values in the paper. For example, I would be interested in the inferred lapse trial probabilities p1. The authors go through great lengths to estimate the posterior distributions over diffusion model parameters and I wonder why they don’t share the results with us (at least mean and variance for a start).

In conclusion, the authors follow a trend to explain behaviour in terms of Bayesian ideal observer models extended by flexible cost functions and apply this idea to perceptual decision making via a detour through a diffusion model. Although I appreciate the sound work presented in the paper, I’m worried that the time-varying bound/cost is too flexible and acts as a kind of ‘get out of jail free’ card which blocks the view to other, potentially additional mechanisms underlying the observed behaviour.

References

[1] Bogacz, R.; Brown, E.; Moehlis, J.; Holmes, P. & Cohen, J. D. The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. Psychol Rev, 2006, 113, 700-765

[2] Dayan, P. & Daw, N. D. Decision theory, reinforcement learning, and the brain. Cogn Affect Behav Neurosci, 2008, 8, 429-453