Drugowitsch, J., Moreno-Bote, R., Churchland, A. K., Shadlen, M. N., and Pouget, A.

The Journal of Neuroscience, 32:3612–3628, *2012*

DOI, Google Scholar

### Abstract

Decision making often involves the accumulation of information over time, but acquiring information typically comes at a cost. Little is known about the cost incurred by animals and humans for acquiring additional information from sensory variables due, for instance, to attentional efforts. Through a novel integration of diffusion models and dynamic programming, we were able to estimate the cost of making additional observations per unit of time from two monkeys and six humans in a reaction time (RT) random-dot motion discrimination task. Surprisingly, we find that the cost is neither zero nor constant over time, but for the animals and humans features a brief period in which it is constant but increases thereafter. In addition, we show that our theory accurately matches the observed reaction time distributions for each stimulus condition, the time-dependent choice accuracy both conditional on stimulus strength and independent of it, and choice accuracy and mean reaction times as a function of stimulus strength. The theory also correctly predicts that urgency signals in the brain should be independent of the difficulty, or stimulus strength, at each trial.

### Review

The authors show equivalence between a probabilistic and a diffusion model of perceptual decision making and consequently explain experimentally observed behaviour in the random dot motion task in terms of varying bounds in the diffusion model which correspond to varying costs in the probabilistic model. Here, I discuss their model in detail and outline its limits. My main worry with the presented model is that it may be too powerful to have real explanatory power. Impatient readers may want to skip to the conclusion below.

Perceptual model

The presented model is tailored to the two-alternative, forced choice random dot motion task. The fundamental assumption for the model is that at each point in discrete time, or equivalently, for each successive time period in continuous time the perceptual process of the decision maker produces an independent sample of evidence whose mean, mu*dt, reflects the strength (coherence) and direction (only through sign of evidence) of random dot motion while its variance, sigma2, reflects the passage of time (sigma2 = dt, the time period between observations). This definition of input to the decision model as independent samples of motion strength in either one of two (unspecified) directions restricts the model to two decision alternatives. Consequently, the presented model does not apply to more alternatives, or dependent samples.

The model of noisy, momentary evidence corresponds to a Wiener process with drift which is exactly what standard (drift) diffusion models of perceptual decision making are where drift is equal to mu and diffusion is equal to sigma2. You could wonder why sigma2 is exactly equal to dt and not larger, or smaller, but this is controlled by setting the mean evidence mu to an appropriate level by allowing it to scale: mu = k*c, where k is an arbitrary scaling constant which is fit to data and c is the random dot coherence in the current trial. Therefore, by controlling k you essentially control the signal to noise ratio in the model of the experiment and you would get equivalent results, if you changed sigma2 while fixing mu = c. The difference between the two cases is purely conceptual: In the former case you assume that the neuronal population in MT signals, on average, a scaled motion strength where the scaling may be different for different subjects, but signal variance is the same over subjects while in the latter case you assume that the MT signal, on average, corresponds to motion strength directly, but MT signal variance varies across subjects. Personally, I prefer the latter.

The decision circuit in the author’s model takes the samples of momentary evidence as described above and computes a posterior belief over the two considered alternatives (motion directions). This posterior belief depends on the posterior probability distribution over mean motion strengths mu which is computed from the samples of momentary evidence taking a prior distribution over motion strengths into account. An important assumption in the computation of the posterior is that the decision maker (or decision circuit) has a perfect model of how the samples of momentary evidence are generated (a Gaussian with mean mu*dt and variance dt). If, for example, the decision maker would assume a slightly different variance, that would also explain differences in mean accuracy and decision times. The assumption of the perfect model, however, allows the authors to assert that the experimentally observed fraction of correct choices at a time t is equal to the internal belief of the decision maker (subject) that the chosen alternative is the correct one. This is important, because only with an estimate of this internal belief the authors can later infer the time-varying waiting costs for the subject (see below).

Anyway, under the given model the authors show that for a Gaussian prior you obtain a Gaussian posterior over motion strength mu (Eq. 4) and for a discrete prior you obtain a corresponding discrete posterior (Eq. 7). Importantly, the parameters of the posteriors can be formulated as functions of the current state x(t) of the sample-generating diffusion process and elapsed time t. Consequently, also the posterior belief over decision alternatives can be formulated as a one-to-one, i.e., invertible function of the diffusion state (and time t). By this connection, the authors have shown that, under an appropriate transformation, decisions based on the posterior belief are equivalent to decisions based on the (accumulated) diffusion state x(t) set in relation to elapsed time t.

In summary, the probabilistic perceptual decision model of the authors simply estimates the motion strength from the samples and then decides whether the estimate is positive or negative. Furthermore, this procedure is equivalent to accumulating the samples and deciding whether the accumulated state is very positive or very negative (as determined by hitting a bound). The described diffusion model has been used before to fit accuracies and mean reaction times of subjects, but apparently it was never quite good in fitting the full reaction time distribution (note that it lacks the extensions of the drift diffusion models suggested by Ratcliff, see, e.g., [1]). So here the authors extend the diffusion model by adding time-varying bounds which can be interpreted in the probabilistic model as a time-varying cost of waiting for more samples.

Time-varying bounds and costs

Intuitively, introducing a time-varying bound in a diffusion model introduces great flexibility in shaping the response accuracy and timing at any given time point. However, I currently do not have a good idea of just how flexible the model becomes. For example, if in discrete time changing the bound at each time step could independently modify the accuracy and reaction time distribution at this time step, the bound alone could explain the data. I don’t believe that this extreme case is true, but I would like to know how close you would come. In any case, it appears to be sensible to restrict how much the bound can vary to prevent overfitting of the data, or indeed to prevent making the other model parameters obsolete. In the present paper, the authors control the shape of the bound by using a function made of cosine basis functions. Although this restricts the bound to be a smooth function of time, it still allows considerable flexibility. The authors use two more approaches to control the flexibility of the bound. One is to constrain the bound to be the same for all coherences, meaning that it cannot be used to explain differences between coherences (experimental conditions). The other is to use Bayesian methods for fitting the data. On the one hand, this controls the bound by choosing particular priors. They do this by only considering parameter values in a restricted range, but I do not know how wide or narrow this range is in practice. On the other hand, the Bayesian approach leads to posterior distributions over parameters which means that subsequent analyses can take the uncertainty over parameters into account (see, e.g., the indicated uncertainty over the inferred bound in Fig. 5A). Although I remain with some last doubts about whether the bound was too flexible, I believe that this is not a big issue here.

It is, however, a different question whether the time-varying bound is a good explanation for the observed behaviour in contrast, e.g., to the extensions of the diffusion model introduced by Ratcliff (mostly trial-by-trial parameter variability). There, one might refer to the second, decision-related part of the presented model which considers the rewards and costs associated with decisions. In the Bayesian decision model presented in the paper the subject decides at each time step whether to select alternative 1, or alternative 2, or wait for more evidence in the next time step. This mechanism was already mentioned in [2]. Choosing an alternative will either lead to a reward (correct answer) or punishment (error), but waiting is also associated with a cost which may change throughout the trial. Deciding for the optimal course of action which maximises reward per unit time then is an average-reward reinforcement learning problem which the authors solve using dynamic programming. For a particular setting of reward, punishment and waiting costs this can be translated into an equivalent time-varying bound. More importantly, the procedure can be reversed such that the time-varying cost can be inferred from a bound that had been fitted to data. Apart from the bound, however, the estimate of the cost also depends on the reward/punishment setting and on an estimate of choice accuracy at each time step. Note that the latter differs considerably from the overall accuracy which is usually used to fit diffusion models and requires more data, especially when the error rate is low.

The Bayesian decision model, therefore, allows to translate the time-varying bound to a time-varying cost which then provides an explanation of the particular shape of the reaction time distribution (and accuracy) in terms of the intrinsic motivation (negative cost) of the subject to wait for more evidence. Notice that this intrinsic motivation is really just a value describing how much somebody (dis-)likes to wait and it cannot be interpreted in terms of trying to be better in the task anymore, because all these components have been taken care of by other parts of the decision model. So what does it mean when a subject likes to wait for new evidence just for the sake of it (cf. dip in cost at beginning of trial in human data in Fig. 8)? I don’t know.

Collapsing bounds as found from behavioural data in this paper have been associated with an urgency signal in neural data which drives firing rates of all decision neurons towards a bound at the end of a trial irrespective of the input / evidence. This has been interpreted as a response of the subjects to the approaching deadline (end of trial) that they do not want to miss. The explanation in terms of a waiting cost which rises towards the end of a trial suggests that subjects just have a built-in desire to make (potentially arbitrary) choices before a deadline. To me, this is rather unintuitive. If you’re not punished for making a wrong choice (blue lines in Figs. 7 and 8, but note that there was a small time-punishment in the human experiment) shouldn’t it be always beneficial to make a choice before the deadline, because you trade uncertain reward against certain no reward? This would already be able to explain the urgency signal without consideration of a waiting cost. So why do we see one anyway? It may just all depend on the particular setting of reward and punishment for correct choices and errors, respectively. The authors present different inferred waiting costs with varying amounts of punishment and argue that the results are qualitatively equal, but the three different values of punishment they present hardly exhaust the range of values that could be assumed. Also, they did not vary the amount of reward given for correct choices, but it is likely that only the difference between reward and punishment determines the behaviour of the model such that it doesn’t matter whether you change reward or punishment to explore model predictions.

Conclusion

The main contribution of the paper is to show that accuracy and reaction time distribution can be explained by a time-varying bound in a simple diffusion model in which the drift scales linearly with stimulus intensity (coherence in random dot motion). I tried to point out that this result may not be surprising depending on how much flexibility a time-varying bound adds to the model. Additionally, the authors present a connection between diffusion and Bayesian models of perceptual decision making which allows them to reinterpret the time-varying bounds in terms of the subjective cost of waiting for more evidence to arrive. The authors argue that this cost increases towards the end of a trial, but for two reasons I’m not entirely convinced: 1) Conceptually, it is worth considering the origin of a possible waiting cost. It could correspond to the energetic cost of keeping the inference machinery running and the attention on the task, but there is no reason why this should increase towards a deadline. 2) I’m not convinced by the presented results that the inferred increase of cost towards a deadline is qualitatively independent of the reward/punishment setting. A greater range of punishments should have been tested. Note that you cannot infer the rewards for decisions and the time-varying waiting cost at the same time from the behavioural data. So this issue cannot be settled without some new experiments which measure rewards or costs more directly. Finally, I miss an overview of fitted parameter values in the paper. For example, I would be interested in the inferred lapse trial probabilities p1. The authors go through great lengths to estimate the posterior distributions over diffusion model parameters and I wonder why they don’t share the results with us (at least mean and variance for a start).

In conclusion, the authors follow a trend to explain behaviour in terms of Bayesian ideal observer models extended by flexible cost functions and apply this idea to perceptual decision making via a detour through a diffusion model. Although I appreciate the sound work presented in the paper, I’m worried that the time-varying bound/cost is too flexible and acts as a kind of ‘get out of jail free’ card which blocks the view to other, potentially additional mechanisms underlying the observed behaviour.

References

[1] Bogacz, R.; Brown, E.; Moehlis, J.; Holmes, P. & Cohen, J. D. The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. Psychol Rev, 2006, 113, 700-765

[2] Dayan, P. & Daw, N. D. Decision theory, reinforcement learning, and the brain. Cogn Affect Behav Neurosci, 2008, 8, 429-453