## Normative evidence accumulation in unpredictable environments.

Glaze, C. M., Kable, J. W., and Gold, J. I.
Elife, 4, 2015

### Abstract

In our dynamic world, decisions about noisy stimuli can require temporal accumulation of evidence to identify steady signals; differentiation to detect unpredictable changes in those signals; or both. Normative models can account for learning in these environments but have not yet been applied to faster decision processes. We present a novel, normative formulation of adaptive learning models that forms decisions by acting as a leaky accumulator with non-absorbing bounds. These dynamics, derived for both discrete and continuous cases, depend on the expected rate of change of the statistics of the evidence and balance signal identification and change detection. We found that, for two different tasks, human subjects learned these expectations, albeit imperfectly, then used them to make decisions in accordance with the normative model. The results represent a unified, empirically supported account of decision-making in unpredictable environments that provides new insights into the expectation-driven dynamics of the underlying neural signals.

### Review

The authors suggest a model of sequential information processing that is aware of possible switches in the underlying source of information. They further show that the model fits responses of people in two perceptual decision making tasks and consequently argue that behaviour, which was previously considered to be suboptimal, may follow the normative, i.e., optimal, mechanism of the model. This mechanism postulates that typical evidence accumulation mechanisms in perceptual decision making are altered by the expected switch rate of the stimulus. Specifically, evidence accumulation becomes more leaky and a non-absorbing bound becomes lower when the expected switch rate increases. The paper is generally well-written (although there are some convoluted bits in the results section) and convincing. I was a bit surprised, though, that only choices, but not their timing is considered in the analysis with the model. In the following I’ll go through some more details of the model and discuss limitations of the presented models and their relation to other models in the field, but first I describe the experiments reported in the paper.

The paper reports two experiments. In the first (triangles task) people saw two triangles on the screen and had to judge whether a single dot was more likely to originate from the one triangle or the other. There was one dot and corresponding response per trial. In each trial the position of the dot was redrawn from a Gaussian distribution centred around one of the two triangles. There were also change point trials in which the triangle from which the dot was drawn switched (and then remained the same until the next change point). The authors analysed the proportion correct in relation to whether a trial was a change point. Trials were grouped into blocks which were defined by constant rate of switches (hazard rate) in the true originating triangle. In the second experiment (dots-reversal task), a random dot stimulus repeatedly switched (reversed) direction within a trial. In each trial people had to tell in which direction the dots moved before they vanished. The authors analysed the proportion correct in relation to the time between the last switch and the end of stimulus presentation. There were no blocks. Each trial had one of two hazard rates and one of two difficulty levels. The two difficulty levels were determined for each subject individually such that the more difficult one lead to correct identification of motion direction of a 500ms long stimulus in 65% of cases.

The authors present two normative models, one discrete and one continuous, which they apply across and within trial in the triangles and dots-reversal tasks, respectively. The discrete model is a simple hidden Markov model in which the hidden state can take one of two values and there is a common transition probability between these two values which they call hazard ‘rate’ (H). Observations were implicitly assumed Gaussian. They only enter during fitting as log-likelihood ratios in the form $$\beta*x_n$$ where beta is a scaling relating to the internal / sensory uncertainty associated with the generative model of observations and $$x_n$$ is the observed dot position (x-coordinate) in the triangles task. In methods, the authors derive the update equation for the log posterior odds ($$L_n$$) of the hidden state values given in Eqs. (1) and (2).

The continuous model is based on a Markov jump process with two states which is the continuous equivalent of the hidden Markov model above. Using Ito-calculus the authors again derive an update equation for the log posterior odds of the two states (Eq. 4), but during fitting they actually approximate Eq. (4) with the discrete Eq. (1), because it is supposedly the most efficient discrete-time approximation of Eq. (4) (no explanation for why this is the case was given). They just replace the log-likelihood ratio placeholder (LLR) with a coherence-dependent term applicable to the random dot motion stimulus. Notably, in contrast to standard drift-diffusion modelling of random dot motion tasks, the authors used coherence-dependent noise. I’d be interested in the reason for this choice.

There is an apparent fundamental difference between the discrete and continuous models which can be seen in Fig. 1 B vs C. In the discrete model, for H>0.5, the log posterior odds may actually switch sign from one observation to the next whereas this cannot happen in the continuous model. Conceptually, this means that the log posterior odds in the discrete model, when the LLR is 0, i.e., when there is no evidence in either direction, would oscillate between decreasing positive and increasing negative values until converging to 0. This oscillation can be seen in Fig. 2G, red line for |LLR|>0. In the continuous model such an oscillation cannot happen, because the infinitely many, tiny time steps allow the model to converge to 0 before switching the sign. Another way to see this is through the discrete hazard ‘rate’ H which is the probability of a sign reversal within one time step of size dt. When you want to decrease dt in the model, but want to maintain a given rate of sign reversals in, e.g., 1 second, H would also have to decrease. Consequently, when dt approaches 0, the probability of a sign reversal approaches 0, too, which means that H is a useless parameter in continuous time which, in turn, is the reason why it is replaced by a real rate parameter ($$\lambda$$) representing the expected number of reversals per second. In conclusion, the fundamental difference between discrete and continuous models is only an apparent one. They are very similar models, just expressed in different resolutions of time. In that sense it would have perhaps been better to present results in the paper consistently in terms of a real hazard rate ($$\lambda$$) which could be obtained in the triangles task by dividing H by the average duration of a trial in seconds. Notice that the discrete model represents all hazard rates $$\lambda>1/dt$$ as H=1, i.e., it cannot represent hazard rates which would lead to more than 1 expected sign reversal per $$dt$$. There may be more subtle differences between the models when the exact distributions of sign reversals are considered instead of only the expected rates.

Using first order approximations of the two models the authors identify two components in the dynamics of the log posterior odds L: a leak and a bias. [Side remark: there is a small sign mistake in the definition of leak k of the continuous model in the Methods section.] Both depend on hazard rate and the authors show that the leak dominates the dynamics for small L whereas the bias dominates for large L. I find this denomination a bit misleading, because both, leak and bias, effectively result in a leak of log-posterior odds L by reducing L in every time step (cf. Fig. 1B,C). The change from a multiplicative leak to one based on a bias just means that the effective amount of leak in L increases nonlinearly with L as the bias takes over.

To test whether this special form of leak underlies decision making the authors compared the full model to two versions which only had a multiplicative leak, or one based on bias. In the former the leak stayed constant for increasing L, i.e., $$L’ = \gamma*L$$. In the latter there was perfect accumulation without leak up to the bias and then a bias-based leak which corresponds to a multiplicative leak where the leak rate increased with L such that $$L’ = \gamma(L)*L$$ with $$\gamma(L) = bias / L$$. The authors report evidence that in both tasks both alternative models do not describe choice behaviour as well as the full, normative model. In Fig. 9 they provide a reason by estimating the effective leak rate in the data and the models in dependence on the strength of sensory evidence (coherence in the dots reversal task). They do this by fitting the model with multiplicative leak separately to trials with low and high coherence (fitting to choices in the data or predicted by the different fitted models). In both data and normative model the effective leak rates depended on coherence. This dependence arises, because high sensory evidence leads to large values of L and I have argued above that larger L has larger effective leak rate due to the bias. It is, therefore, not surprising that the alternative model with multiplicative leak shows no dependence of effective leak on coherence. But it is also not surprising that the alternative model with bias-based leak has a larger dependence of effective leak on coherence than the data, because this model jumps from no leak to very large leak when coherence jumps from low to high. The full, normative model lies in between, because it smoothly transitions between the two alternative models.

Why is there a leak in the first place? Other people have found no evidence for a leak in evidence accumulation (eg. Brunton et al., 2013). The leak results from the possibility of a switch of the source of the observations, i.e., a switch of the underlying true stimulus. Without any information, i.e., without observations the possibility of a switch means that you should become more uncertain about the stimulus as time passes. The larger the hazard rate, i.e., the larger the probability of a switch within some time window, the faster you should become uncertain about the current stimulus. For a log posterior odds of L=0 uncertainty is at its maximum (both stimuli have equal posterior probability). This is another reason why discrete hazard ‘rates’ H>0.5 which lead to sign reversals in L do not make much sense. The absence of evidence for one stimulus should not lead to evidence for the other stimulus. Anyway, as the hazard rate goes to 0 the leak will go to 0 such that in experiments where usually no switches in stimulus occur subjects should not exhibit a leak which explains why we often find no evidence for leaks in typical perceptual decision making experiments. This does not mean that there is no leak, though. Especially, the authors report here that hazard rates estimated from behaviour of subjects (subjective) tended to be a bit higher than the ones used to generate the stimuli (objective), when the objective hazard rates were very low and the other way around for high objective hazard rates. This indicates that people have some prior expectations towards intermediate hazard rates that biased their estimates of hazard rates in the experiment.

The discussed forms of leak implement a property of the model that the authors called a ‘non-absorbing bound’. I find this wording also a bit misleading, because ‘bound’ was usually used to indicate a threshold in drift diffusion models which, when reached, would trigger a response. The bound here triggers nothing. Rather, it represents an asymptote of the average log posterior odds. Thus, it’s not an absolute bound, but it’s often passed due to variance in the momentary sensory evidence (LLR). I can also not follow the authors when they write: “The stabilizing boundary is also in contrast to the asymptote in leaky accumulation, which increases linearly with the strength of evidence”. Based on the dynamics of L discussed above the ‘bound’ here should exhibit exactly the described behaviour of an asymptote in leaky accumulation. The strength of evidence is reflected in the magnitude of LLR which is added to the intrinsic dynamics of the log posterior odds L. The non-absorbing bound, therefore, should be given by bias + average of LLR for the current stimulus. The bound, thus, should rise linearly with the strength of evidence (LLR).

Fitting of the discrete and continuous models was done by maximising the likelihood of the models (in some fits with many parameters, priors over parameters were used to regularise the optimisation). The likelihood in the discrete models was Gaussian with mean equal to the log posterior odds ($$L_n$$) computed from the actual dot positions $$x_n$$. The variance of the Gaussian likelihood was fitted to the data as a free parameter. In the continuous model the likelihood was numerically approximated by simulating the discretised evolution of the probabilities that the log posterior odds take on particular values. This is very similar to the approach used by Brunton2013. The distribution of the log posterior odds $$L_n$$ was considered here, because the stream of sensory observations $$x(t)$$ was unknown and therefore had to enter as a random variable while in the triangles task $$x(t)=x_n$$ was set to the known x-coordinates of the presented dots.

The authors argued that the fits of behaviour were good, but at least for the dots reversal task Fig. 8 suggests otherwise. For example, Fig. 8G shows that 6 out of 12 subjects (there were supposed to be 13, but I can only see 12 in the plots) made 100% errors in trials with the low hazard rate of 0.1Hz and low coherence where the last switch in stimulus was very recent (maximally 300ms before the end of stimulus presentation). The best fitting model, however, predicted error rates of at most 90% in these conditions. Furthermore, there is a significant difference in choice errors between the low and high hazard rate for large times after the last switch in stimulus (Fig. 8A, more errors for high hazard rate) which was not predicted by the fitted normative model. Despite these differences the fitted normative model seems to capture the overall patterns in the data.

#### Conclusion

The authors present an interesting normative model in discrete and continuous time that extends previous models of evidence accumulation to situations in which switches in the presented stimulus can be expected. In light of this model, a leak in evidence accumulation reflects a tendency to increase uncertainty about the stimulus due to a potentially upcoming switch in the stimulus. The model provides a mathematical relation between the precise type of leak and the expected switch (hazard) rate of the stimulus. In particular, and in contrast to previous models, the leak in the present model depends nonlinearly on the accumulated evidence. As the authors discuss, the presented normative model potentially unifies decision making processes observed in different situations characterised by different stabilities of the underlying stimuli. I had the impression that the authors were very thorough in their analysis. However, some deviations of model and data apparent in Fig. 8 suggest that either the model itself, or the fitting procedure may be improved such that the model better fits people’s behaviour in the dots-reversal task. It was anyway surprising to me that subjects only had to make a single response per trial in that task. This feels like a big waste of potential choice data when I consider that each trial was 5-10s long and contained several stimulus switches (reversals).

## A test of Bayesian observer models of processing in the Eriksen flanker task.

White, C. N., Brown, S., and Ratcliff, R.
J Exp Psychol Hum Percept Perform, 38:489–497, 2012

### Abstract

Two Bayesian observer models were recently proposed to account for data from the Eriksen flanker task, in which flanking items interfere with processing of a central target. One model assumes that interference stems from a perceptual bias to process nearby items as if they are compatible, and the other assumes that the interference is due to spatial uncertainty in the visual system (Yu, Dayan, & Cohen, 2009). Both models were shown to produce one aspect of the empirical data, the below-chance dip in accuracy for fast responses to incongruent trials. However, the models had not been fit to the full set of behavioral data from the flanker task, nor had they been contrasted with other models. The present study demonstrates that neither model can account for the behavioral data as well as a comparison spotlight-diffusion model. Both observer models missed key aspects of the data, challenging the validity of their underlying mechanisms. Analysis of a new hybrid model showed that the shortcomings of the observer models stem from their assumptions about visual processing, not the use of a Bayesian decision process.

### Review

This is a response to Yu2009 in which the authors show that Yu et al.'s main Bayesian models cannot account for the full data of an Eriksen flanker task. In particular, Yu et al.'s models predict a far too high overall error rate with the suggested parameter settings that reproduce the inital drop of accuracy below chance level for very fast responses. The argument put forward by White et al. is that the mechanisms used in Yu et al.'s models to overcome initial, flanker-induced biases is too slow, i.e., the probabilistic evidence accumulation implemented by the models is influenced by the flankers for too long. White et al's shrinking spotlight models do not have such a problem, mostly because the speed with which flankers loose influence is fitted to the data. The argument seems compelling, but I would like to understand better why it takes so long in the Bayesian model to overcome flanker influence and whether there are other ways of speeding this up than the one suggested by White et al..

## Dynamics of attentional selection under conflict: toward a rational Bayesian account.

Yu, A. J., Dayan, P., and Cohen, J. D.
J Exp Psychol Hum Percept Perform, 35:700–717, 2009

### Abstract

The brain exhibits remarkable facility in exerting attentional control in most circumstances, but it also suffers apparent limitations in others. The authors' goal is to construct a rational account for why attentional control appears suboptimal under conditions of conflict and what this implies about the underlying computational principles. The formal framework used is based on Bayesian probability theory, which provides a convenient language for delineating the rationale and dynamics of attentional selection. The authors illustrate these issues with the Eriksen flanker task, a classical paradigm that explores the effects of competing sensory inputs on response tendencies. The authors show how 2 distinctly formulated models, based on compatibility bias and spatial uncertainty principles, can account for the behavioral data. They also suggest novel experiments that may differentiate these models. In addition, they elaborate a simplified model that approximates optimal computation and may map more directly onto the underlying neural machinery. This approximate model uses conflict monitoring, putatively mediated by the anterior cingulate cortex, as a proxy for compatibility representation. The authors also consider how this conflict information might be disseminated and used to control processing.

### Review

They suggest two simple, Bayesian perceptual models based on evidence integration for the (deadlined) Eriksen task. Their focus is on attentional mechanisms that can explain why particpants' responses are below chance for very fast responses. These mechanisms are based on a prior on compatibility (that flankers are compatible with the relevant centre stimulus) and spatial uncertainty (flankers influence processing of centre stimulus on a low, sensory level). The core inference is the same and replicates the basic mechanism you would expect for any perceptual decision making task. They don't fit behaviour, but rather show average trajectories from model simulations with hand-tuned parameters. They further suggest a third model inspired by previous work on conflict monitoring and cognitive control which supposedly is more likely to be implemented in the brain, because instead of having to consider (and compute with) all possible stimuli in the environment, it uses a conflict monitoring mechanism to switch between types of stimuli that are considered.

## Effects of cortical microstimulation on confidence in a perceptual decision.

Fetsch, C. R., Kiani, R., Newsome, W. T., and Shadlen, M. N.
Neuron, 83:797–804, 2014

### Abstract

Decisions are often associated with a degree of certainty, or confidence-an estimate of the probability that the chosen option will be correct. Recent neurophysiological results suggest that the central processing of evidence leading to a perceptual decision also establishes a level of confidence. Here we provide a causal test of this hypothesis by electrically stimulating areas of the visual cortex involved in motion perception. Monkeys discriminated the direction of motion in a noisy display and were sometimes allowed to opt out of the direction choice if their confidence was low. Microstimulation did not reduce overall confidence in the decision but instead altered confidence in a manner that mimicked a change in visual motion, plus a small increase in sensory noise. The results suggest that the same sensory neural signals support choice, reaction time, and confidence in a decision and that artificial manipulation of these signals preserves the quantitative relationship between accumulated evidence and confidence.

### Review

The paper provides verification of beliefs asserted in Kiani2009: Confidence is directly linked to accumulated evidence as represented in monkey area LIP during a random dot motion discrimination task. The authors use exactly the same task, but now stimulate patches of MT/MST neurons instead of recording single LIP neurons and resort to analysing behavioural data only. They find that small microstimulation of functionally well-defined neurons, that signal a particular motion direction, affects decisions in the same way as manipulating the motion information in the stimulus directly. This was expected, because it has been shown before that stimulating MT neurons influences decisions in that way. New here is that the effect of stimulation on confidence judgements was evaluated at the same time. The rather humdrum result: confidence judgements are also affected in the same way. The authors argue that this didn’t have to be, because confidence judgements are thought to be a metacognitive process that may be influenced by other high-level cognitive functions such as related to motivation. Then again, isn’t decision making thought to be a high-level cognitive function that is clearly influenced by motivation?

Anyway, there was one small effect particular to stimulation that did not occur in the control experiment where the stimulus itself was manipulated: There was a slight decrease in the overall proportion of sure-bet choices (presumably indicating low confidence) with stimulation suggesting that monkeys were more confident when stimulated. The authors explain this with larger noise (diffusion) in a simple drift-diffusion model. Counterintuitively, the larger accumulation noise increases the probability of moving away from the initial value and out of the low-confidence region. The mechanism makes sense, but I would rather explain it within an equivalent Bayesian model in which MT neurons represent noisy observations that are transformed into noisy pieces of evidence which are accumulated in LIP. Stimulation increases the noise on the observations which in turn increases accumulation noise in the equivalent drift-diffusion model (see Bitzer et al., 2014).

In drift-diffusion models drift, diffusion and threshold are mutually redundant in that one of them needs to be fixed when fitting the model to choices and reaction times. The authors here let all of them vary simultaneously which indicates that the parameters can be discriminated based on confidence judgements even when no reaction time is taken into account. This should be followed up. It is also interesting to think about how the postulated tight link between the ‘decision variable’ and the experienced confidence can be consolidated in a reaction time task where supposedly all decisions are made at the same threshold value. Notice that the confidence of a decision in their framework depends on the state of the diffusion (most likely one of the two boundaries) and the time of the decision: Assuming fixed noise, smaller decision times should translate into larger confidence, because you assume that this is due to a larger drift. Therefore, you should see variability of confidence judgements in a reaction time task that is strongly correlated with reaction times.

## Probabilistic reasoning by neurons.

Yang, T. and Shadlen, M. N.
Nature, 447:1075–1080, 2007

### Abstract

Our brains allow us to reason about alternatives and to make choices that are likely to pay off. Often there is no one correct answer, but instead one that is favoured simply because it is more likely to lead to reward. A variety of probabilistic classification tasks probe the covert strategies that humans use to decide among alternatives based on evidence that bears only probabilistically on outcome. Here we show that rhesus monkeys can also achieve such reasoning. We have trained two monkeys to choose between a pair of coloured targets after viewing four shapes, shown sequentially, that governed the probability that one of the targets would furnish reward. Monkeys learned to combine probabilistic information from the shape combinations. Moreover, neurons in the parietal cortex reveal the addition and subtraction of probabilistic quantities that underlie decision-making on this task.

### Review

The authors argue that the brain reasons probabilistically, because they find that single neuron responses (firing rates) correlate with a measure of probabilistic evidence derived from the probabilistic task setup. It is certainly true that the monkeys could learn the task (a variant of the weather prediction task) and I also find the evidence presented in the paper generally compelling, but the authors note themselves that similar correlations with firing rate may result from other quantitative measures with similar properties as the one considered here. May, for example, firing rates correlate similarly with a measure of expected value of a shape combination as derived from a reinforcement learning model?

What did they do in detail? They trained monkeys on a task in which they had to predict which of two targets will be rewarded based on a set of four shapes presented on the screen. Each shape contributed a certain weight to the probability of rewarding a target as defined by the experimenters. The monkeys had to learn these weights. Then they also had to learn (implicitly) how the weights of shapes are combined to produce the probability of reward. After about 130,000 trials the monkeys were good enough to be tested. The trick in the experiment was that the four shapes were not presented simultaneously, but appeared one after the other. The question was whether neurons in lateral intraparietal (LIP) area of the monkeys’ brains would represent the updated probabilities of reward after addition of each new shape within a trial. That the neurons would do that was hypothesised, because results from previous experiments suggested (see Gold & Shalden, 2007 for review) that neurons in LIP represent accumulated evidence in a perceptual decision making paradigm.

Now Shadlen seems convinced that these neurons do not directly represent the relevant probabilities, but rather represent the log likelihood ratio (logLR) of one choice option over the other (see, e.g., Gold & Shadlen, 2001 and Shadlen et al., 2008). Hence, these ‘posterior’ probabilities play no role in the paper. Instead all results are obtained for the logLR. Funnily the task is defined solely in terms of the posterior probability of reward for a particular combination of four shapes and the logLR needs to be computed from the posterior probabilities (Yang & Shadlen don’t lay out this detail in the paper or the supplementary information). I’m more open about the representation of posterior probabilities directly and I wondered how the correlation with logLR would look like, if the firing rates would respresent posterior probabilities. This is easy to simulate in Matlab (see Yang2007.m). Such a simulation shows that, as a function of logLR, the firing rate (representing posterior probabilities) should follow a sigmoid function. Compare this prediction to Figures 2c and 3b for epoch 4. Such a sigmoidal relationship derives from the boundedness of the posterior probabilities which is obviously reflected in firing rates of neurons as they cannot drop or rise indefinitely. So there could be simple reasons for the boundedness of firing rates other than that they represent probabilities, but in any case it appears unlikely that they represent unbounded log likelihood ratios.

## The Cost of Accumulating Evidence in Perceptual Decision Making.

Drugowitsch, J., Moreno-Bote, R., Churchland, A. K., Shadlen, M. N., and Pouget, A.
The Journal of Neuroscience, 32:3612–3628, 2012

### Abstract

Decision making often involves the accumulation of information over time, but acquiring information typically comes at a cost. Little is known about the cost incurred by animals and humans for acquiring additional information from sensory variables due, for instance, to attentional efforts. Through a novel integration of diffusion models and dynamic programming, we were able to estimate the cost of making additional observations per unit of time from two monkeys and six humans in a reaction time (RT) random-dot motion discrimination task. Surprisingly, we find that the cost is neither zero nor constant over time, but for the animals and humans features a brief period in which it is constant but increases thereafter. In addition, we show that our theory accurately matches the observed reaction time distributions for each stimulus condition, the time-dependent choice accuracy both conditional on stimulus strength and independent of it, and choice accuracy and mean reaction times as a function of stimulus strength. The theory also correctly predicts that urgency signals in the brain should be independent of the difficulty, or stimulus strength, at each trial.

### Review

The authors show equivalence between a probabilistic and a diffusion model of perceptual decision making and consequently explain experimentally observed behaviour in the random dot motion task in terms of varying bounds in the diffusion model which correspond to varying costs in the probabilistic model. Here, I discuss their model in detail and outline its limits. My main worry with the presented model is that it may be too powerful to have real explanatory power. Impatient readers may want to skip to the conclusion below.

Perceptual model

The presented model is tailored to the two-alternative, forced choice random dot motion task. The fundamental assumption for the model is that at each point in discrete time, or equivalently, for each successive time period in continuous time the perceptual process of the decision maker produces an independent sample of evidence whose mean, mu*dt, reflects the strength (coherence) and direction (only through sign of evidence) of random dot motion while its variance, sigma2, reflects the passage of time (sigma2 = dt, the time period between observations). This definition of input to the decision model as independent samples of motion strength in either one of two (unspecified) directions restricts the model to two decision alternatives. Consequently, the presented model does not apply to more alternatives, or dependent samples.

The model of noisy, momentary evidence corresponds to a Wiener process with drift which is exactly what standard (drift) diffusion models of perceptual decision making are where drift is equal to mu and diffusion is equal to sigma2. You could wonder why sigma2 is exactly equal to dt and not larger, or smaller, but this is controlled by setting the mean evidence mu to an appropriate level by allowing it to scale: mu = k*c, where k is an arbitrary scaling constant which is fit to data and c is the random dot coherence in the current trial. Therefore, by controlling k you essentially control the signal to noise ratio in the model of the experiment and you would get equivalent results, if you changed sigma2 while fixing mu = c. The difference between the two cases is purely conceptual: In the former case you assume that the neuronal population in MT signals, on average, a scaled motion strength where the scaling may be different for different subjects, but signal variance is the same over subjects while in the latter case you assume that the MT signal, on average, corresponds to motion strength directly, but MT signal variance varies across subjects. Personally, I prefer the latter.

The decision circuit in the author’s model takes the samples of momentary evidence as described above and computes a posterior belief over the two considered alternatives (motion directions). This posterior belief depends on the posterior probability distribution over mean motion strengths mu which is computed from the samples of momentary evidence taking a prior distribution over motion strengths into account. An important assumption in the computation of the posterior is that the decision maker (or decision circuit) has a perfect model of how the samples of momentary evidence are generated (a Gaussian with mean mu*dt and variance dt). If, for example, the decision maker would assume a slightly different variance, that would also explain differences in mean accuracy and decision times. The assumption of the perfect model, however, allows the authors to assert that the experimentally observed fraction of correct choices at a time t is equal to the internal belief of the decision maker (subject) that the chosen alternative is the correct one. This is important, because only with an estimate of this internal belief the authors can later infer the time-varying waiting costs for the subject (see below).

Anyway, under the given model the authors show that for a Gaussian prior you obtain a Gaussian posterior over motion strength mu (Eq. 4) and for a discrete prior you obtain a corresponding discrete posterior (Eq. 7). Importantly, the parameters of the posteriors can be formulated as functions of the current state x(t) of the sample-generating diffusion process and elapsed time t. Consequently, also the posterior belief over decision alternatives can be formulated as a one-to-one, i.e., invertible function of the diffusion state (and time t). By this connection, the authors have shown that, under an appropriate transformation, decisions based on the posterior belief are equivalent to decisions based on the (accumulated) diffusion state x(t) set in relation to elapsed time t.

In summary, the probabilistic perceptual decision model of the authors simply estimates the motion strength from the samples and then decides whether the estimate is positive or negative. Furthermore, this procedure is equivalent to accumulating the samples and deciding whether the accumulated state is very positive or very negative (as determined by hitting a bound). The described diffusion model has been used before to fit accuracies and mean reaction times of subjects, but apparently it was never quite good in fitting the full reaction time distribution (note that it lacks the extensions of the drift diffusion models suggested by Ratcliff, see, e.g., [1]). So here the authors extend the diffusion model by adding time-varying bounds which can be interpreted in the probabilistic model as a time-varying cost of waiting for more samples.

Time-varying bounds and costs

Intuitively, introducing a time-varying bound in a diffusion model introduces great flexibility in shaping the response accuracy and timing at any given time point. However, I currently do not have a good idea of just how flexible the model becomes. For example, if in discrete time changing the bound at each time step could independently modify the accuracy and reaction time distribution at this time step, the bound alone could explain the data. I don’t believe that this extreme case is true, but I would like to know how close you would come. In any case, it appears to be sensible to restrict how much the bound can vary to prevent overfitting of the data, or indeed to prevent making the other model parameters obsolete. In the present paper, the authors control the shape of the bound by using a function made of cosine basis functions. Although this restricts the bound to be a smooth function of time, it still allows considerable flexibility. The authors use two more approaches to control the flexibility of the bound. One is to constrain the bound to be the same for all coherences, meaning that it cannot be used to explain differences between coherences (experimental conditions). The other is to use Bayesian methods for fitting the data. On the one hand, this controls the bound by choosing particular priors. They do this by only considering parameter values in a restricted range, but I do not know how wide or narrow this range is in practice. On the other hand, the Bayesian approach leads to posterior distributions over parameters which means that subsequent analyses can take the uncertainty over parameters into account (see, e.g., the indicated uncertainty over the inferred bound in Fig. 5A). Although I remain with some last doubts about whether the bound was too flexible, I believe that this is not a big issue here.

It is, however, a different question whether the time-varying bound is a good explanation for the observed behaviour in contrast, e.g., to the extensions of the diffusion model introduced by Ratcliff (mostly trial-by-trial parameter variability). There, one might refer to the second, decision-related part of the presented model which considers the rewards and costs associated with decisions. In the Bayesian decision model presented in the paper the subject decides at each time step whether to select alternative 1, or alternative 2, or wait for more evidence in the next time step. This mechanism was already mentioned in [2]. Choosing an alternative will either lead to a reward (correct answer) or punishment (error), but waiting is also associated with a cost which may change throughout the trial. Deciding for the optimal course of action which maximises reward per unit time then is an average-reward reinforcement learning problem which the authors solve using dynamic programming. For a particular setting of reward, punishment and waiting costs this can be translated into an equivalent time-varying bound. More importantly, the procedure can be reversed such that the time-varying cost can be inferred from a bound that had been fitted to data. Apart from the bound, however, the estimate of the cost also depends on the reward/punishment setting and on an estimate of choice accuracy at each time step. Note that the latter differs considerably from the overall accuracy which is usually used to fit diffusion models and requires more data, especially when the error rate is low.

The Bayesian decision model, therefore, allows to translate the time-varying bound to a time-varying cost which then provides an explanation of the particular shape of the reaction time distribution (and accuracy) in terms of the intrinsic motivation (negative cost) of the subject to wait for more evidence. Notice that this intrinsic motivation is really just a value describing how much somebody (dis-)likes to wait and it cannot be interpreted in terms of trying to be better in the task anymore, because all these components have been taken care of by other parts of the decision model. So what does it mean when a subject likes to wait for new evidence just for the sake of it (cf. dip in cost at beginning of trial in human data in Fig. 8)? I don’t know.

Collapsing bounds as found from behavioural data in this paper have been associated with an urgency signal in neural data which drives firing rates of all decision neurons towards a bound at the end of a trial irrespective of the input / evidence. This has been interpreted as a response of the subjects to the approaching deadline (end of trial) that they do not want to miss. The explanation in terms of a waiting cost which rises towards the end of a trial suggests that subjects just have a built-in desire to make (potentially arbitrary) choices before a deadline. To me, this is rather unintuitive. If you’re not punished for making a wrong choice (blue lines in Figs. 7 and 8, but note that there was a small time-punishment in the human experiment) shouldn’t it be always beneficial to make a choice before the deadline, because you trade uncertain reward against certain no reward? This would already be able to explain the urgency signal without consideration of a waiting cost. So why do we see one anyway? It may just all depend on the particular setting of reward and punishment for correct choices and errors, respectively. The authors present different inferred waiting costs with varying amounts of punishment and argue that the results are qualitatively equal, but the three different values of punishment they present hardly exhaust the range of values that could be assumed. Also, they did not vary the amount of reward given for correct choices, but it is likely that only the difference between reward and punishment determines the behaviour of the model such that it doesn’t matter whether you change reward or punishment to explore model predictions.

Conclusion

The main contribution of the paper is to show that accuracy and reaction time distribution can be explained by a time-varying bound in a simple diffusion model in which the drift scales linearly with stimulus intensity (coherence in random dot motion). I tried to point out that this result may not be surprising depending on how much flexibility a time-varying bound adds to the model. Additionally, the authors present a connection between diffusion and Bayesian models of perceptual decision making which allows them to reinterpret the time-varying bounds in terms of the subjective cost of waiting for more evidence to arrive. The authors argue that this cost increases towards the end of a trial, but for two reasons I’m not entirely convinced: 1) Conceptually, it is worth considering the origin of a possible waiting cost. It could correspond to the energetic cost of keeping the inference machinery running and the attention on the task, but there is no reason why this should increase towards a deadline. 2) I’m not convinced by the presented results that the inferred increase of cost towards a deadline is qualitatively independent of the reward/punishment setting. A greater range of punishments should have been tested. Note that you cannot infer the rewards for decisions and the time-varying waiting cost at the same time from the behavioural data. So this issue cannot be settled without some new experiments which measure rewards or costs more directly. Finally, I miss an overview of fitted parameter values in the paper. For example, I would be interested in the inferred lapse trial probabilities p1. The authors go through great lengths to estimate the posterior distributions over diffusion model parameters and I wonder why they don’t share the results with us (at least mean and variance for a start).

In conclusion, the authors follow a trend to explain behaviour in terms of Bayesian ideal observer models extended by flexible cost functions and apply this idea to perceptual decision making via a detour through a diffusion model. Although I appreciate the sound work presented in the paper, I’m worried that the time-varying bound/cost is too flexible and acts as a kind of ‘get out of jail free’ card which blocks the view to other, potentially additional mechanisms underlying the observed behaviour.

References

[1] Bogacz, R.; Brown, E.; Moehlis, J.; Holmes, P. & Cohen, J. D. The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. Psychol Rev, 2006, 113, 700-765

[2] Dayan, P. & Daw, N. D. Decision theory, reinforcement learning, and the brain. Cogn Affect Behav Neurosci, 2008, 8, 429-453

## Probabilistic vs. non-probabilistic approaches to the neurobiology of perceptual decision-making.

Drugowitsch, J. and Pouget, A.
Curr Opin Neurobiol, 22:963–969, 2012

### Abstract

Optimal binary perceptual decision making requires accumulation of evidence in the form of a probability distribution that specifies the probability of the choices being correct given the evidence so far. Reward rates can then be maximized by stopping the accumulation when the confidence about either option reaches a threshold. Behavioral and neuronal evidence suggests that humans and animals follow such a probabilitistic decision strategy, although its neural implementation has yet to be fully characterized. Here we show that that diffusion decision models and attractor network models provide an approximation to the optimal strategy only under certain circumstances. In particular, neither model type is sufficiently flexible to encode the reliability of both the momentary and the accumulated evidence, which is a pre-requisite to accumulate evidence of time-varying reliability. Probabilistic population codes, by contrast, can encode these quantities and, as a consequence, have the potential to implement the optimal strategy accurately.

### Review

It’s essentially an advertisement for probabilistic population codes (PPCs) for modelling perceptual decisions. In particular, they contrast PPCs to diffusion models and attractor models without going into details. The main argument against attractor models is that they don’t encode a decision confidence in the attractor state. The main argument against diffusion models is that they are not fit to represent varying evidence reliability, but it’s not fully clear to me what they mean by that. The closest I get is that “[…] the drift is a representation of the reliability of the momentary evidence” and they argue that for varying drift rate the diffusion model becomes suboptimal. Of course, if the diffusion model assumes a constant drift rate, it is suboptimal when the drift rate changes, but I’m not sure whether this is the point they are making. The authors mention one potential weak point of PPCs: They predict that the decision bound is defined on a linear combination of integrated momentary evidence, but the firing of neurons in area LIP indicates that the bound is on the estimated correctness of single decisions, i.e., there is a bound for each decision alternative, as in a race model. I interpret this as evidence for a decision model where the bound is defined on the posterior probability of the decision alternatives.

The paper is a bit sloppily written (frequent, easily avoidable language errors).

## Representation of confidence associated with a decision by neurons in the parietal cortex.

Kiani, R. and Shadlen, M. N.
Science, 324:759–764, 2009

### Abstract

The degree of confidence in a decision provides a graded and probabilistic assessment of expected outcome. Although neural mechanisms of perceptual decisions have been studied extensively in primates, little is known about the mechanisms underlying choice certainty. We have shown that the same neurons that represent formation of a decision encode certainty about the decision. Rhesus monkeys made decisions about the direction of moving random dots, spanning a range of difficulties. They were rewarded for correct decisions. On some trials, after viewing the stimulus, the monkeys could opt out of the direction decision for a small but certain reward. Monkeys exercised this option in a manner that revealed their degree of certainty. Neurons in parietal cortex represented formation of the direction decision and the degree of certainty underlying the decision to opt out.

### Review

The authors used a 2AFC-task with an option to waive the decision in favour of a choice which provides low, but certain reward (the sure option) to investigate the representation of confidence in LIP neurons. Behaviourally the sure option had the expected effect: it was increasingly chosen the harder the decisions were, i.e., the more likely a false response was. Trials in which the sure option was chosen, thus, may be interpreted as those in which the subject was little confident in the upcoming decision. It is important to note that task difficulty here was manipulated by providing limited amounts of information for a limited amount of time, i.e., this was not a reaction time task.

The firing rates of the recorded LIP neurons indicate that selection of the sure option is associated with an intermediate level of activity compared to that of subsequent choices of the actual decision options. For individual trials the authors found that firing rates closer to the mean firing rate (in a short time period before the sure option became available) more frequently lead to selection of the sure option than firing rates further away from the mean, but in absolute terms the activity in this time window could predict choice of the sure option only weakly (probability of 0.4). From these results the authors conclude that the LIP neurons which have previously been found to represent evidence accumulation also encode confidence in a decision. They suggest a simple drift-diffusion model with fixed diffusion parameter to explain the results. Additional to standard diffusion models they define confidence in terms of the log-posterior odds which they compute from the state of the drift-diffusion model. They define posterior as p(S_i|v), the probability that decision option i is correct given that the drift-diffusion state (the decision variable) is v. They compute it from the corresponding likelihood p(v|S_i), but don’t state how they obtained that likelihood. Anyway, the sure option is chosen in the model, when the log-posterior odds is below a certain level. I don’t see why the detour via the log-posterior odds is necessary. You could directly define v as the posterior for decision option i and still be consistent with all the findings in the paper. Of course, then v could not be governed by a linear drift anymore, but why should it in the first place? The authors keenly promote the Bayesian brain, but stop just before the finishing line. Why?

## Robust averaging during perceptual judgment.

de Gardelle, V. and Summerfield, C.
Proc Natl Acad Sci U S A, 108:13341–13346, 2011

### Abstract

An optimal agent will base judgments on the strength and reliability of decision-relevant evidence. However, previous investigations of the computational mechanisms of perceptual judgments have focused on integration of the evidence mean (i.e., strength), and overlooked the contribution of evidence variance (i.e., reliability). Here, using a multielement averaging task, we show that human observers process heterogeneous decision-relevant evidence more slowly and less accurately, even when signal strength, signal-to-noise ratio, category uncertainty, and low-level perceptual variability are controlled for. Moreover, observers tend to exclude or downweight extreme samples of perceptual evidence, as a statistician might exclude an outlying data point. These phenomena are captured by a probabilistic optimal model in which observers integrate the log odds of each choice option. Robust averaging may have evolved to mitigate the influence of untrustworthy evidence in perceptual judgments.

### Review

The authors investigate what influence the variance of evidence has on perceptual decisions. A bit counterintuitively, they implement varying evidence by simultaneously presenting elements with different feature values (e.g. color) to subjects instead of presenting only one element which changes its feature value over time (would be my naive approach). Perhaps they did this to be able to assume constant evidence over time such that the standard drift diffusion model applies. My intuition is that subjects anyway implement a more sequential sampling of the stimulus display by varying attention to individual elements.

The behavioural results show that subjects take both mean presented evidence as well as the variance of evidence into account when making a decision: For larger mean evidence and smaller variance of evidence subjects are faster and make less mistakes. The results are attention dependent: mean and variance in a task-irrelevant feature dimension had no effect on responses.

The behavioural results can be explained by a drift diffusion model with a drift rate which takes the variance of the evidence into account. The authors present two such drift rates. 1) SNR drift = mean / standard deviation (as computed from trial-specific feature values). 2) LPR drift = mean log posterior ratio (also computed from trial-specific feature values). The two cannot be differentiated based on the measured mean RTs and error rates in the different conditions. So the authors provide an additional analysis which estimates the influence of the different presented elements, that is, the influence of the different feature values presented by them, on the given responses. This is done via a generalised linear regression by fitting a model which predicts response probabilites from presented feature values for individual trials. The fitted linear weights suggest that extreme (outlying) feature values have little influence on the final responses compared to the influence that (inlying) feature values close to the categorisation boundary have. Only the LPR model (2) replicates this effect.

Why have inlying feature values greater influence on responses than outlying ones in the LPR model, but not in the other models? The LPR model alone would not predict this, because for more extreme posterior values you get more extreme LPR values which then have a greater influence on the mean LPR value, i.e., the drift rate. Therefore, It is not entirely clear to me yet why they find a greater importance of inlying feature values in the generalised linear regression from feature values to responses. The best explanation I currently have is the influence of the estimated posterior values: Fig. S5 shows that the posterior values are constant for sufficiently outlying feature values and only change for inlying feature values, where the greatest change is at the feature value defining the categorisation boundary. When mapped through the LPR the posterior values lead to LPR values following the same sigmoidal form setting low and high feature values to constants. These constant high and low values may cancel each other out when, on average, they are equally many. Then, only the inlying feature values may have a lasting contribution on the LPR mean; especially those close to the categorisation boundary, because they tend to lead to larger variation in LPR values which may tip the LPR mean (drift rate) towards one of the two responses. This explanation means that the results depend on the estimated posterior values. In particular, that these are set to values of about 0.2, or 0.8, respectively, for a large range of extreme feature values.

I am unsure what conclusions can be drawn from the results. Although, the basic behavioural results are clear, it is not surprising that the responses of subjects depend on the variance of the presented evidence. You can define the feature values varying around the mean as noise. More variance then just means more noise and it is a basic result that people become slower and more error prone when presented with more noise. Perhaps surprisingly, it is here shown that this also works when noisy features are presented simultaneously on the screen instead of sequentially over time.

The DDM analysis shows that the drift rate of subjects decreases with increasing variance of evidence. This makes sense and means that subjects become more cautious in their judgements when confronted with larger variance (more noise). But I find the LPR model rather strange. It’s like pressing a Bayesian model into a mechanistic corset. The posterior ratio is an ad-hoc construct. Ok, it’s equivalent to the log-likelihood ratio, but why making it to a posterior ratio then? The vagueness arises already because of how the task is defined: all information is presented at once, but you want to describe accumulation of evidence over time. Consequently, you have to define some approximate, ad-hoc construct (mean LPR) which you can use to define the temporal integration. That the model based on that construct replicates an aspect of the behavioural data may be an artefact of the particular approximation used (apparently it is important that the estimated posterior values are constant for extreme feature values). So, it remains unclear to me whether an LPR-DDM is a good explanation for the involved processes in this case.

Actually, a large part of the paper (cf. title) concerns the finding that extreme feature values appear to have smaller influence on subject responses than feature values close to the categorisation boundary. This is surprising to me. Although it makes intuitive sense in terms of ‘robust averaging’, I wouldn’t predict it for optimal probabilistic integration of evidence, at least not without making further assumptions. Such assumptions are also implicit in the LPR-DDM and I’m a bit skeptical about it anyway. Thus, a good explanation is still needed, in my opinion. Finally, I wonder how reliable the generalised linear regression analysis, which led to these results, is. On the one hand, the authors report using two different generalised linear models and obtaining equivalent results. On the other hand, they estimate 9 parameters from only one binary response variable and I wonder how the optimisation landscape looks in this case.

## A healthy fear of the unknown: perspectives on the interpretation of parameter fits from computational models in neuroscience.

Nassar, M. R. and Gold, J. I.
PLoS Comput Biol, 9:e1003015, 2013

### Abstract

Fitting models to behavior is commonly used to infer the latent computational factors responsible for generating behavior. However, the complexity of many behaviors can handicap the interpretation of such models. Here we provide perspectives on problems that can arise when interpreting parameter fits from models that provide incomplete descriptions of behavior. We illustrate these problems by fitting commonly used and neurophysiologically motivated reinforcement-learning models to simulated behavioral data sets from learning tasks. These model fits can pass a host of standard goodness-of-fit tests and other model-selection diagnostics even when the models do not provide a complete description of the behavioral data. We show that such incomplete models can be misleading by yielding biased estimates of the parameters explicitly included in the models. This problem is particularly pernicious when the neglected factors are unknown and therefore not easily identified by model comparisons and similar methods. An obvious conclusion is that a parsimonious description of behavioral data does not necessarily imply an accurate description of the underlying computations. Moreover, general goodness-of-fit measures are not a strong basis to support claims that a particular model can provide a generalized understanding of the computations that govern behavior. To help overcome these challenges, we advocate the design of tasks that provide direct reports of the computational variables of interest. Such direct reports complement model-fitting approaches by providing a more complete, albeit possibly more task-specific, representation of the factors that drive behavior. Computational models then provide a means to connect such task-specific results to a more general algorithmic understanding of the brain.

### Review

Nassar and Gold use tasks from their recent experiments (e.g. Nassar et al., 2012) to point to the difficulties of interpreting model fits of behavioural data. The background is that it has become more popular to explain experimental findings (often behaviour) using computational models. But how reliable are those computational interpretations and how to ensure that they are valid? I will briefly review what Nassar and Gold did and point out that researchers investigating reward learning using computational models should think about learning rate adaptation in their experiments, because, in the light of the present paper, their results may else not be interpretable. Further, I will argue that Nassar and Gold’s appeal to more interaction between modelling and task design is just how science should work in principle.

Background

The considered tasks belong to the popular class of reward learning tasks in which a subject has to learn which choices are rewarded to maximise reward. These tasks may be modelled by a simple delta-rule mechanism which updates current (learnt) estimates of reward by an amount proportional to a prediction error where the exact amount of update is determined by a learning rate. This learning rate is one of the parameters that you want to fit to data. The second parameter Nassar and Gold consider is the ‘inverse temperature’ which tells how a subject trades off exploitation (choose to get reward) against exploration (choose randomly).

Nassar and Gold’s tasks are special, because at so-called change points during an experiment the underlying rewards may abruptly change (in addition to smaller variation of reward between single trials). The experimental subject then has to learn the new reward values. Importantly, Nassar and Gold have found that subjects use an adaptive learning rate, i.e., when subjects encounter small prediction errors they tend to reduce the learning rate while they tend to increase learning rate when experiencing large prediction errors. However, typical delta-rule learning models assume a fixed learning rate.

The issue

The issue discussed in the paper is that it will not be easily possible to detect a problem when fitting a fixed learning rate model to choices which were produced with an adaptive learning rate. As shown in the present paper, this issue results from a redundancy between learning rate adaptiveness (a hyperparameter, or hidden factor) and the inverse temperature with respect to subject choices, i.e., a change in learning rate adaptiveness can equivalently be explained by a change in inverse temperature (with fixed learning rate adaptiveness) when such a change is only measured by the choices a subject makes. Statistically, this means that, if you were to fit learning rate adaptiveness with inverse temperature to subject choices, then you should find that the two parameters are highly correlated given the data. Even better, if you were to look at the posterior distribution of the two parameters given subject choices, you should observe a large variance of them together with a strong covariance between them. As a statistician you would then report this variance and acknowledge that interpretation may be difficult. But learning rate adaptiveness is not typically fitted to choices. Instead only learning rate itself is fitted given a particular adaptiveness. Then, the relation between adaptiveness and inverse temperature is hidden from the analysis and investigators may be fooled into thinking that the combination of fitted learning rate and inverse temperature comprehensively explains the data. Well, it does explain the data, but there are potentially many other explanations of this kind which become apparent when the hidden factor learning rate adaptiveness is taken into account.

What does it mean?

The discussed issue exemplifies a general problem of cognitive psychology: that you try to investigate (computational) mechanisms, e.g., decision making, by looking at quite impoverished data, e.g., decisions, which only represent the final product of the mechanisms. So what you do is to guess a mechanism (a model) and see whether it fits the data. In the case of Nassar and Gold there was a prevailing guess which fit the data reasonably well. By investigating decision making in a particular, new situation (environment with change points) they found that they needed to extend that mechanism to account for the new data. However, the extended mechanism now has many explanations for the old impoverished data, because the extended mechanism is more flexible than the old mechanism. To me, this is all just part of the normal progress in science and nothing to be alarmed about in principle. Yet, Nassar and Gold are right to point out that in the light of the extended mechanism fits of the old mechanism to old data may be misleading. Interpreting the parameters of the old mechanism may then be similar to saying that you find that the earth is a disk, because from your window it looks like the ground goes to the horizon in a straight line and then stops.

Conclusion

Essentially, Nassar and Gold try to convince us that when looking at reward learning we should now also take learning rate adaptiveness into account, i.e., that we should interpret subject choices within their extended mechanism. Two questions remain: 1) Do we trust that their extended mechanism is worth pursuing? 2) If yes, what can we do with the old data?

The present paper does not provide evidence that their extended mechanism is a useful model for subject choices (1), because they here assumed that the extended mechanism is true and investigated how you would interpret the new data using the old mechanism. However, their original study and others point to the importance of learning rate adaptiveness [see their refs. 9-11,26-28].

If the extended mechanism is correct, then the present paper shows that the old data is pretty much useless (2) unless learning rate adaptiveness has been, perhaps accidentally, controlled for in previous studies. This is because the old data from previous experiments (probably) does not allow to estimate learning rate adaptiveness. Of course, if you can safely assume that the learning rate of subjects stayed roughly fixed in your experiment, for example, because prediction errors were very similar during the whole experiment, then the old mechanism with fixed learning rate should still apply and your data is interpretable in the light of the extended mechanism. Perhaps it would be useful to investigate how robust fitted parameters are to varying learning rate adaptiveness in a typical experiment producing old data (here we only see results for experiments designed to induce changes in learning rate through large jumps in mean reward values).

Overall the paper has a very general tone. It tries to discuss the difficulties of fitting computational models to behaviour in general. In my opinion, these things should be clear to anyone in science as they just reflect how science progresses: you make models which need to fit an observed phenomenon and you need to refine models when new observations are made. You progress by seeking new observations. There is nothing special about fitting computational models to behaviour with respect to this.