Causal role of dorsolateral prefrontal cortex in human perceptual decision making.

Philiastides, M. G., Auksztulewicz, R., Heekeren, H. R., and Blankenburg, F.
Curr Biol, 21:980–983, 2011
DOI, Google Scholar


The way that we interpret and interact with the world entails making decisions on the basis of available sensory evidence. Recent primate neurophysiology [1-6], human neuroimaging [7-13], and modeling experiments [14-19] have demonstrated that perceptual decisions are based on an integrative process in which sensory evidence accumulates over time until an internal decision bound is reached. Here we used repetitive transcranial magnetic stimulation (rTMS) to provide causal support for the role of the dorsolateral prefrontal cortex (DLPFC) in this integrative process. Specifically, we used a speeded perceptual categorization task designed to induce a time-dependent accumulation of sensory evidence through rapidly updating dynamic stimuli and found that disruption of the left DLPFC with low-frequency rTMS reduced accuracy and increased response times relative to a sham condition. Importantly, using the drift-diffusion model, we show that these behavioral effects correspond to a decrease in drift rate, a parameter describing the rate and thereby the efficiency of the sensory evidence integration in the decision process. These results provide causal evidence linking the DLPFC to the mechanism of evidence accumulation during perceptual decision making.


They apply repetitive TMS to the dorsolateralprefrontal cortex (DLPFC) assuming that this inhibits the decision making ability of subjects, because DLPFC has been shown to be involved in perceptual decision making. Indeed, they find a significant effect of TMS vs. SHAM on the responses of subjects (after TMS responses of subjects are less accurate and take longer). They also argue that the effect is particular to TMS, because it reduces over time, but I wonder why they did not compute the corresponding interaction (they just report that the effect of TMS vs. SHAM is significant earlier, but not significant later).

Furthermore, they hypothesised that TMS disrupted the accumulation process of noisy evidence over time by decreasing the rate of evidence increase. This is based on the previous finding that the DLPFC has higher BOLD activation for less noisy stimuli which suggests that, when DLPFC is disrupted, the evidence coming from less noisy stimuli cannot be optimally processed anymore.

They investigated the evidence accumulation hypothesis by fitting a drift-diffusion model (DDM) to response data. The DDM has more parameters than are necessary to explain the variations of response data for the different experimental conditions. Hence, they use the Bayesian information criterion (BIC) to select parameters which should be fitted for each experimental condition separately, i.e., to be able to say which parameters are affected by the experimental manipulations. The other parameters are still fitted but to all data across experimental conditions. The problem is that the BIC is a very crude approximation just taking the number of freely varying parameters into account. For example, an assumption underlying the BIC is that the Hessian of the likelihood evaluated at the fitted parameter values has full rank (Bishop, 2006, p. 217), but for highly correlated parameters this may not be the case. The used DMAT fitting toolbox actually approximates the Hessian matrix, checks whether a local minimum has been found (instead of a valley) and computes confidence intervals from the approximated Hessian, but the authors report no results for this apart from error bars on the plot for drift rate and nondecision time.

Anyway, the BIC analysis conveniently indicates that drift rate and nondecision time best explain the variations in response data across conditions. However, it has to be kept in mind that these results have been obtained by (presumably) assuming that the diffusion is fixed across conditions which is the standard when fitting a DDM [private correspondence with Rafal Bogacz, 09/2012], because drift rate, diffusion and threshold are redundant (a change in one of them can be reverted by a suitable change in the others). The interpretation of the BIC analysis probably should be that drift rate and nondecision time are the smallest set of parameters which still allow a good fit of the data given that diffusion is fixed.

You need to be careful when interpreting the fitted parameter values in the different conditions. In particular, fitting a DDM to data assumes that the evidence accumulation still works like a DDM, just with different parameters. However, it is not clear what TMS does to the affected processes in the brain. Hence, we can only say from the fitting results that TMS has an effect which is equivalent to a reduction of the drift rate (no clear effect on nondecision time) in a normally functioning DDM.

Similarly, the interpretation of the results for nondecision time is not straightforward. There, the main finding is that nondecision time decreases for high-evidence stimuli which the authors interpret as a reduced time of low-level sensory processing which provides input to evidence accumulation. However, it should be kept in mind that the total amount of time necessary to make a decision is also reduced for high-evidence stimuli. Also, part of the processes which are collected under ‘nondecision time’ may actually work in parallel to evidence accumulation, e.g., movement preparation. If you look at the percentage of RT that is explained by the nondecision time, then the picture is reversed: for high-evidence stimuli nondecision time explains about 82% of RTs while for low-evidence stimuli it explains only about 75% which is consistent with the basic idea that evidence accumulation takes longer for noisier stimuli. In general, these percentages are surprisingly high. Does the evidence accumulation really only account for about 25% of total RTs? But it’s good that we have a number to compare now.

So what do these findings mean for the DLPFC? We cannot draw any definite conclusions. The hypothesis that TMS over DLPFC affects drift rate is somewhat built into the analysis, because the authors use a DDM to fit the responses. Of course, other parameters could have been affected stronger such that the finding of the BIC analysis that drift rate explains the changes best can indeed be taken as evidence for the drift rate hypothesis. However, it is not possible to exclude other explanations which lie outside the parameter space of the DDM. What, for example, if the DLPFC has indeed a somewhat attentional effect on evidence accumulation in the sense that it not only accumulates evidence, but also modulates how big the individual peaces of evidence are by modulating lower-level sensory processing? Then, interrupting the DLPFC may still have a similar effect as observed here, but the interpretation of the role of the DLPFC would be slightly different. Actually, the authors argue against a role of the DLPFC (at least the part of DLPFC they found) in attentional processing, but I’m not entirely convinced. Their main argument is based on the assumption that a top-down attentional effect of the DLPFC on low-level sensory processing would increase the nondecision time, but this is not necessarily true. A) there is the previously mentioned issue of parallel processing and the general problems of fitting a standard model to a disturbed process which makes me doubt the reliability of the fitted nondecision times and B) I can easily conceive a system in which attentional modulation would not delay low-level sensory processing.

Why don’t we use Bayesian statistics to analyse experimental data?

This paper decoder post is a little different as it doesn’t relate to a particular paper. Rather it’s my answer to the question in the title of this post which was triggered by a colleague of mine. The colleague has a psychology background and just got to know about Bayesian statistics when the following question crossed his mind:


do Bayesian stuff, right? Trying to learn about it now, can’t quite get my head
around it yet, but it sounds like how I should be analysing data. In
psychophysics we usually collect a lot of data from a small number of subjects,
but then collapse all this data into a small number of points per subject for
the purposes of stats. This loses quite a lot of fine detail: for instance,
four steep psychometric functions with widely different means average together
to create a shallow function, which is not a good representation of the data.
Usually, the way psychoacousticians in particular get around this problem is
not to bother with the stats. This, of course, is not optimal either! As far as
I can tell the Bayesian approach to stats allows you to retain the variance
(and thus the detail) from each stage of analysis, which sounds perfect for my
old phd data and for the data i’m collecting now. it also sounds like the thing
to do for neuroimaging data: we collect a HUGE amount of data per subject in
the scanner, but then create these extremely course averages, leading people to
become very happy when they see something at the single-subject level. But of
course all effects should REALLY be at the single-subject level, we assume they
aren’t visible due to noise. So I’m wondering why everyone doesn’t employ this
Bayesian approach, even in fMRI etc..

In short, my answer is twofold: 1) Bayesian statistics can be computationally very hard and, conceptually critical, 2) choosing a prior influences the results of your statistical inference which makes experimenters uneasy.

The following is my full answer. It contains a basic introduction to Bayesian statistics targeted to people who just realised that this exists. I bet that a simple search for “Bayesian frequentist” brings up a lot more valuable information.


You’re right: the best way to analyse any data is to maintain the full distribution of your variables of interest throughout all analysis steps. You nicely described the reasons for this. The problem only is that this can be really hard depending on your statistical model, i.e., data. So you’ll need to make approximations. One way of doing this is to summarise the distribution by its mean and variance. The Gaussian distribution is so cool, because these two variables are actually sufficient to represent the whole distribution. For other probability distributions the mean and variance are not sufficient representations so that when you summarise the distribution with them you make an approximation. Therefore, you could say that the standard analysis methods you mention are valid approximations in the sense that they summarise the desired distribution with its mean. Then the question becomes: Can you make better approximations for the model you consider? This is where the expertise of the statistician comes into play, because what you can do really depends on the particular situation with your data. It’s most of the time impossible to come up with the right distribution analytically, but actually many things could be solved numerically in the computer these days.

Now a little clarification what I understand under the Bayesian approach. Here’s a hypothetical example: your variable of interest, x, is whether person A is a genius. You can’t really tell directly whether a person is a genius and you have to collect indirect evidence, y, from their behaviour (might be the questions they ask, the answers they give, or indeed a battery of psychological tests). So x can take values 0 (no genius) and 1 (genius). Your inference will be based on a statistical model of behaviour given genius or no genius (in words: if A is a genius then with probability p(y|x=1) he will exhibit behaviour y):

p(y|x=1) and p(y|x=0).

In a frequentist (classic) approach you will make a maximum likelihood estimate for x which will end up in a simple procedure where you sum up the log-probabilities of your evidence and compare which sum is larger:

sum over i log(p(y_i|x=1)) > sum over i log(p(y_i|x=0)) ???

If this statement is true, you’ll believe that A is a genius. Now, the problem is that, if you only have a few pieces of evidence, you can easily make false judgements with this procedure. Bayesians therefore take one additional source of information into account: the prior probability of someone being a genius, p(x=1), which is quite low. We can then get something called a maximum a posteriori estimate in which you weight evidence by the prior probability which leads to the following decision procedure:

sum over i log(p(y_i|x=1)p(x=1)) > sum over i log(p(y_i|x=0)p(x=0)) ???

Because p(x=1) is much smaller than p(x=0) this means that you now have to collect much more evidence where the probability of behaviour given that A is a genius, p(y_i|x=1), is larger than the probability of behaviour given that A is no genius, p(y_i|x=0), before you believe that A is a genius. In the full Bayesian approach you would actually not make a judgement, but estimate the posterior probability of A being a genius:

p(x=1|y) = p(y|x=1)p(x=1) / p(y).

This is the distribution which I said is hard to estimate above. The thing that makes it hard is p(y). In this case, where x can only take two values it is actually very easy to compute:

p(y) = p(y|x=1)p(x=1) + p(y|x=0)p(x=0)

but for each additional value x can take you’ll have to add a term to this equation and when x is a continuous variable this sum will become an integral and integration is hard.

One more, but very important thing: the technical problems aside, the biggest criticism of the Baysian approach is the use of the prior. In my example it helped us from making a premature judgement, but only because we had a suitable estimate of the prior probability of someone being a genius. The question is where does the prior come from? Well, it’s prior information that enters your inference. If you don’t have prior information about your variable of interest, you’ll use an uninformative prior which assigns equal probability to each value of x. Then the maximum likelihood and maximum a posteriori estimators above become equal, but what does it mean for the posterior distribution p(x|y)? It changes its interpretation. The posterior becomes an entity representing a belief over the corresponding statement (A is a genius) given the prior information provided by the prior. If the prior measures the true frequency of the corresponding event in the real world, the posterior is a statement about the state of the world. But if the prior has no such interpretation, the posterior is just the mentioned belief under the assumed prior. These arguments are very subtle. Think about my example. The prior could be paraphrased as the prior probabilty that person A is a genius. This prior cannot represent a frequency in the world, because person A exists only once in the world. So whatever we choose as prior merely is a prior belief. While frequentists often argue that the posterior does not faithfully represent the world, because of a potentially unsuitable prior, in my example the Bayesian approach allowed us to incorporate information in the inference that is inaccessible to the frequentist approach. We did this by transferring the frequency of being a genius in the whole population to our a priori belief that person A is a genius.

Note that there really is no “correct” prior in my example and any prior will correspond to a particular prior assumption. Furthermore, the frequentist maximum likelihood estimator is equivalent to a maximum a posteriori estimator with a particular (uninformative) prior. Therefore, it has been argued that the Bayesian approach just makes the prior assumptions explicit which are implicit also in the more common (frequentist) statistical analyses. Unfortunately, it seems to be a bitter pill to swallow for experimenters to admit that their statistical analysis (and thus outcome) of their experiment depends on prior assumptions (although they appear to be happy to do this in other contexts, for example, when making Gaussian assumptions when doing an ANOVA). Also, remember that the prior will ultimately be overwritten by sufficient evidence (even for a very low prior probability of A being a genius we’ll at some point belief that A is a genius, if A behaves accordingly). Given these considerations, the prior shouldn’t be a hindrance for using a Bayesian analyis of experimental data, but the technical issues remain.

Tuning properties of the auditory frequency-shift detectors.

Demany, L., Pressnitzer, D., and Semal, C.
J Acoust Soc Am, 126:1342–1348, 2009
DOI, Google Scholar


Demany and Ramos [(2005). J. Acoust. Soc. Am. 117, 833-841] found that it is possible to hear an upward or downward pitch change between two successive pure tones differing in frequency even when the first tone is informationally masked by other tones, preventing a conscious perception of its pitch. This provides evidence for the existence of automatic frequency-shift detectors (FSDs) in the auditory system. The present study was intended to estimate the magnitude of the frequency shifts optimally detected by the FSDs. Listeners were presented with sound sequences consisting of (1) a 300-ms or 100-ms random “chord” of synchronous pure tones, separated by constant intervals of either 650 cents or 1000 cents; (2) an interstimulus interval (ISI) varying from 100 to 900 ms; (3) a single pure tone at a variable frequency distance (Delta) from a randomly selected component of the chord. The task was to indicate if the final pure tone was higher or lower than the nearest component of the chord. Irrespective of the chord’s properties and of the ISI, performance was best when Delta was equal to about 120 cents (1/10 octave). Therefore, this interval seems to be the frequency shift optimally detected by the FSDs.


If you present 5 tones simultaneously, people cannot tell whether a subsequently presented tone was one of the 5 tones, or lay in the middle between any 2 of the 5 tones. On the other hand, people can judge whether a subsequently presented tone lay above or below any one of the 5 tones. This paper investigates the dependency of this effect on how much the subsequent tone lay above or below one of the 5 (here actually 6) tones (frequency shift), on how much the 6 tones were separated (Iv) and on the interstimulus interval (ISI) between the first set of tones and the subsequent tone. The authors replicated the mentioned previous findings and presented data suggesting that there is an optimal frequency shift at which subjects performed best in the task. They argue that this is at roughly 120 cents.

I have several remarks about the analysis. First of all, the number of subjects in the two experiments is very low (7 and 4, each including the first author). While in experiment 1 the curves of d-prime over subjects look relatively consistent, this is not the case for larger ISIs in experiment 2. The main flaw of the analysis is that their suggestion of an optimal frequency shift of 120 cents is based on curve fitting of an exponential function to 4,5, or 6 data points where they also add an artificial baseline data point at d-prime=0 for frequency shift=0. The data point as such makes sense as the judgement of a subject whether the shift was up or down must be random when the true shift was actually 0. Still, it feels wrong to include an artificial data point in the analysis. In the end, especially for large ISIs the variability of thus estimated optimal frequency shifts for individual subjects is so variable that it seems pointless to conclude anything about the mean over (4) subjects.

Sam actually tried to replicate the original finding on which this paper is based. He commented that it was hard to replicate it in a large group of subjects and he found differences between musicians and non-musicians (which shouldn’t be true for something that belongs to really basic hearing abilities). He also noted that subjects were generally quite bad in this task and that he found it to be impossible to make the task easier, when one wants to maintain that the 6 initial tones cannot be perceived individually.

The authors of the paper seem to use subjects, which perform particularly well in these tasks, repeatedly in their experiments.

It has been noted in the groupmeeting that this research could be linked better to, e.g., the mismatch negativity literature which is also concerned with detection of deviations. Sam pointed to the publication containing the original findings in response.

Category-specific versus category-general semantic impairment induced by transcranial magnetic stimulation.

Pobric, G., Jefferies, E., and Ralph, M. A. L.
Curr Biol, 20:964–968, 2010
DOI, Google Scholar


Semantic cognition permits us to bring meaning to our verbal and nonverbal experiences and to generate context- and time-appropriate behavior. It is core to language and nonverbal skilled behaviors and, when impaired after brain damage, it generates significant disability. A fundamental neuroscience question is, therefore, how does the brain code and generate semantic cognition? Historical and some contemporary theories emphasize that conceptualization stems from the joint action of modality-specific association cortices (the “distributed” theory) reflecting our accumulated verbal, motor, and sensory experiences. Parallel studies of semantic dementia, rTMS in normal participants, and neuroimaging indicate that the anterior temporal lobe (ATL) plays a crucial and necessary role in conceptualization by merging experience into an amodal semantic representation. Some contemporary computational models suggest that concepts reflect a hub-and-spoke combination of information–modality-specific association areas support sensory, verbal, and motor sources (the spokes) while anterior temporal lobes act as an amodal hub. We demonstrate novel and striking evidence in favor of this hypothesis by applying rTMS to normal participants: ATL stimulation generates a category-general impairment whereas IPL stimulation induces a category-specific deficit for man-made objects, reflecting the coding of praxis in this neural region.


This is a short TMS experiment investigating the role of the left anterior temporal lobe (ATL) in semantic processing of stimuli. Semantics here is practically defined as the association to a high-level category defining an object. The task was simply to name the object shown on a picture. Involvement of ATL in this task is indicated by patients with semantic dementia who forget the meaning of categories/objects, i.e., they cannot associate a perceived object with its category/class (example: they see a sheep and don’t know what it is – do they still know what a sheep is, if you tell them that it is a sheep?).

The experiment is supposed to differentiate between 3 hypothesis: 1) object meaning results from a distributed representation of a stimulus between all modalities, 2) object meaning is only generated in ATL, other areas provide only sensory input and 3) part of the object meaning is generated already in single modal areas and ATL acts as an amodal integration hub. These hypothesis are only verbally described and indeed it seems difficult to differentiate between 2) and 3).

The experiment shows that 10min of repetitive TMS can increase response times of subjects in the picture naming, but not a number reading task, if TMS was applied to the left ATL. In a post-hoc analysis the authors then devided the shown pictures into living-nonliving and low-high manipulable objects and again looked for interactions with TMS stimulation. They found that stimulation of left IPL, an area associated with manipulable objects, had an effect on nonliving and high-manipulable objects while having no effect on the others. Stimulation of ATL, however, had a (smaller) effect on all categories. Furthermore, stimulation in occipital lobe had no effect with respect to taks or stimulus at all. The authors conclude that this is evidence for hypothesis 3) above.

A major concern with the study is that the main result has been obtained with a post-hoc analysis and the authors did not even specify more precisely which pictures they used in this analysis, e.g., we don’t know which objects were among them. Furthermore, the results do not really allow to make any conclusions about the connectivity of the different regions. Hypotheses 2) and 3) cannot be discerned with the given results. Even hypotheses 1) could still be true, if one assumes that ATL is a region mainly for producing verbal output of a category – something necessary for the task, but not necessarily involved in associating with a category. However, Katharina mentioned that ATL was also implicated in experiments with other output modalities (e.g. drawing). So, what stays, if one believes the post-hoc analysis, is that TMS on ATL disrupts picture naming in general while TMS on IPL disrupts picture naming selectively for nonliving, high-manipulable objects. We cannot rule out any of the hypotheses above completely.

Recurrent excitation in neocortical circuits.

Douglas, R. J., Koch, C., Mahowald, M., Martin, K. A., and Suarez, H. H.
Science, 269:981–985, 1995
DOI, Google Scholar


The majority of synapses in the mammalian cortex originate from cortical neurons. Indeed, the largest input to cortical cells comes from neighboring excitatory cells. However, most models of cortical development and processing do not reflect the anatomy and physiology of feedback excitation and are restricted to serial feedforward excitation. This report describes how populations of neurons in cat visual cortex can use excitatory feedback, characterized as an effective “network conductance”, to amplify their feedforward input signals and demonstrates how neuronal discharge can be kept proportional to stimulus strength despite strong, recurrent connections that threaten to cause runaway excitation. These principles are incorporated into models of cortical direction and orientation selectivity that emphasize the basic design principles of cortical architectures.


The paper suggests that the functional role of recurrent excitatory connections is to amplify (increase gain between inputs and outputs) and denoise inputs to a (sensory) cortical area. This would allow these input signals to be relatively small and would, therefore, help to save energy (they don’t make this argument explicitly).

The work is motivated by an estimate of the number of recurrent connections directly made between spiny stellate cells of layer IV in the cat visual cortex. The authors conclude that these connections alone can already “provide a significant source of recurrent excitation”.

First, they consider an electronic circuit analogy describing the feed-forward input and recurrent currents acting on a neuron in the network. They look at the influence of the recurrent conductance (can be seen as the connectivity strength between all recurrently connected neurons) on the stability of the network and suggest that inhibitory neurons keep the network stable when the recurrent conductance is too high and would alone lead to divergence of network activities. However, they also implemented a model recurrent network consisting of excitatory and inhibitory spiking neurons and showed that it can implement direction selectivity of V1 simple cells. Interestingly, direction selectivity is based on asymmetric firing of excitatory and inhibitory connections from LGN (“in the preferred direction excitation precedes inhibition”) which they support with two references.

I find it hard to believe that cortical recurrent networks apparently don’t do any computations on their own except for improving the incoming signal. It means that all computations are actually done in the feed forward connections between areas. The excitation-inhibition asynchrony being an example here. But then, if you assume a hierarchy of similar processing units, where does, e.g., the necessary excitation-inhibition asynchrony come from? Well, potentially there are readout-neurons outside of the recurrently connected network which do exactly that. Then again, the whole processing in the brain would be feed-forward where the only intrinsically dynamic units would just amplify the feed-forward signals. Reservoir computing could be seen as an extension to this where the dynamics of the recurrent neurons is allowed to be more sophisticated, but becomes uninterpretable in turn. Still, the presented model is consistent, as far as I can tell, with the idea that the activity in response to a stimulus represents the posterior while activity at rest represents the prior over the variables represented by the network under consideration.

Note that the authors do not have any direct experimental evidence for their model in terms of simultaneous recordings of neurons in the same network. They only compare two summary statistics based on individual cells, for the second of which I don’t understand the experiment.

Recurrent neuronal circuits in the neocortex.

Douglas, R. J. and Martin, K. A. C.
Curr Biol, 17:R496–R500, 2007
DOI, Google Scholar


In this Primer, we shall describe one interesting property of neocortical circuits – recurrent connectivity – and suggest what its computational significance might be.


First, they use data of the distribution of synapses in cat visual cortex to argue that the predominant drive of activity in a cortical area is from recurrent connections within this area. They then suggest that the reason for this is the ability to enhance and denoise incoming signals through suitable recurrent connections. They show corresponding functional behaviour in a model based on linear threshold neurons (LTNs). They do not use sigmoid activation functions, because neurons apparently only rarely operate on their maximum firing rate such that sigmoid activation functions are not necessary. To maintain stability they instead use a global inhibitory unit. I guess you could equivalently use a suitable sigmoid function. Finally they suggest that top-down connections may bias the activity in the recurrent network such that one of a few alternative inputs may be selected based on, e.g., attention.

So here the functional role of the recurrent neural network is merely to increase the signal to noise ratio. It’s a bit strange to me that actually no computation is done. Does that mean that all the computation from sensory signals to hidden states are done by the projections from lower level area to higher level area? This seems to be consistent with the reservoir computing idea where the reservoir can also be seen as enhancing the representation of the input (by stretching its effects in time). The difference just being that the dynamics and function in reservoirs is more involved.

The ideas presented here are almost the same as already proposed by the first author in 1995 (see Douglas1995).

Spatiotemporal representations in the olfactory system.

Schaefer, A. T. and Margrie, T. W.
Trends Neurosci, 30:92–100, 2007
DOI, Google Scholar


A complete understanding of the mechanisms underlying any kind of sensory, motor or cognitive task requires analysis from the systems to the cellular level. In olfaction, new behavioural evidence in rodents has provided temporal limits on neural processing times that correspond to less than 150ms–the timescale of a single sniff. Recent in vivo data from the olfactory bulb indicate that, within each sniff, odour representation is not only spatially organized, but also temporally structured by odour-specific patterns of onset latencies. Thus, we propose that the spatial representation of odour is not a static one, but rather evolves across a sniff, whereby for difficult discriminations of similar odours, it is necessary for the olfactory system to “wait” for later-activated components. Based on such evidence, we have devised a working model to assess further the relevance of such spatiotemporal processes in odour representation.


They review evidence for temporal coding of odours in the olfactory bulb (and olfactory receptor neurons). Main finding is that with increasing intensity of an odour corresponding neurons fire more action potentials in a given time. However, this is achieved by an earlier onset of firing while inter-spike intervals stay roughly equal. The authors argue that this is a fast temporal code that can be used to discriminate odours. Especially, they suggest that this can explain why very different odours can be discriminated faster. The assumption there is that these differ mainly in high-intensity, i.e., fast subodours while similar odours differ mainly in low-intensity, i.e., slow subodours. But can it not be that similar odours differ only slightly in high-intensity subodours? My intuition says that the decision boundary is more determined by considerations of uncertainty rather than a temporal code of high- and low-intensity.

The authors ignore that there is an increased amount of action potentials for high-intensity odours and rely in their arguments entirely on the temporal aspect of earlier firing. If only the temporal code was important, this would be a huge energy waste by the brain. Stefan suggested that it might be related to subsequent checks and to cumulating evidence.

Expectation and surprise determine neural population responses in the ventral visual stream.

Egner, T., Monti, J. M., and Summerfield, C.
J Neurosci, 30:16601–16608, 2010
DOI, Google Scholar


Visual cortex is traditionally viewed as a hierarchy of neural feature detectors, with neural population responses being driven by bottom-up stimulus features. Conversely, “predictive coding” models propose that each stage of the visual hierarchy harbors two computationally distinct classes of processing unit: representational units that encode the conditional probability of a stimulus and provide predictions to the next lower level; and error units that encode the mismatch between predictions and bottom-up evidence, and forward prediction error to the next higher level. Predictive coding therefore suggests that neural population responses in category-selective visual regions, like the fusiform face area (FFA), reflect a summation of activity related to prediction (“face expectation”) and prediction error (“face surprise”), rather than a homogenous feature detection response. We tested the rival hypotheses of the feature detection and predictive coding models by collecting functional magnetic resonance imaging data from the FFA while independently varying both stimulus features (faces vs houses) and subjects’ perceptual expectations regarding those features (low vs medium vs high face expectation). The effects of stimulus and expectation factors interacted, whereby FFA activity elicited by face and house stimuli was indistinguishable under high face expectation and maximally differentiated under low face expectation. Using computational modeling, we show that these data can be explained by predictive coding but not by feature detection models, even when the latter are augmented with attentional mechanisms. Thus, population responses in the ventral visual stream appear to be determined by feature expectation and surprise rather than by stimulus features per se.


In general the design of the study is interesting as it is a fMRI study investigating the effects of a stimulus that is presented immediately before the actually analysed stimulus, i.e. temporal dependencies between sequentially presented stimuli of which predictability is a subset (actually priming studies would also fall into this category, don’t know how well they are studied with fMRI).

While the original predictive coding and feature detection models are convincing, the feature detection + attention models are confusing. All models seem to lack a baseline. The attention models are somehow defined on the “differential FFA response” and this is not further explained. The f b_1 part of the attention models can actually be reduced to b_1.

Katharina noted that, in contrast to here where they didn’t do it, you should do a small sample correction, if you want to do the ROI analysis properly.

They do not differentiate between prediction error and surprise in the paper. Surprise is the precision-weighted prediction error.

Cortical Preparatory Activity: Representation of Movement or First Cog in a Dynamical Machine?

Churchland, M. M., Cunningham, J. P., Kaufman, M. T., Ryu, S. I., and Shenoy, K. V.
Neuron, 68:387 – 400, 2010
DOI, Google Scholar


Summary The motor cortices are active during both movement and movement preparation. A common assumption is that preparatory activity constitutes a subthreshold form of movement activity: a neuron active during rightward movements becomes modestly active during preparation of a rightward movement. We asked whether this pattern of activity is, in fact, observed. We found that it was not: at the level of a single neuron, preparatory tuning was weakly correlated with movement-period tuning. Yet, somewhat paradoxically, preparatory tuning could be captured by a preferred direction in an abstract #space# that described the population-level pattern of movement activity. In fact, this relationship accounted for preparatory responses better than did traditional tuning models. These results are expected if preparatory activity provides the initial state of a dynamical system whose evolution produces movement activity. Our results thus suggest that preparatory activity may not represent specific factors, and may instead play a more mechanistic role.


What are the variables that best explain the preparatory tuning of neurons in dorsal premotor and primary motor cortex of monkeys doing a reaching task? This is the core question of the paper which is motivated by the observation of the authors that preparatory and perimovement (ie. within movement) activity of a single neuron may even qualitatively differ considerably (something conflicting with the view that preparatory activity is a subthreshold version of perimovment activity). This observation is experimentally underlined in the paper by showing that average preparatory activity and average perimovement activity of a single neuron are largely uncorrelated for different experimental conditions.

To quantify the suitability of a set of variables to explain perparatory activity of a neuron the authors use a linear regression approach in which the values of these variables for a given experimental condition are used to predict the firing rate of the neuron in that condition. The authors compute the generalisation error of the learnt linear model with crossvalidation and compare the performance of several sets of variables based on this error. The variables performing best are the principal component scores of the perimovement population activity of all recorded neurons. The difference to alternative sets of variables is significant and in particular the wide range of considered variables makes the result convincing (e.g. target position, initial velocity, endpoints and maximum speed, but also principal component scores of EMG activity and kinematic variables, i.e. position, speed and acceleration of the hand). That perimovement activity is the best regressor for preparatory activity is quite odd, or as Burak aptly put it: “They are predicting the past.”

The authors suggest a dynamical systems view as explanation for their results and hypthesise that preparatory activity sets the initial state of the dynamical system constituted by the population of neurons. In this view, the preparatory activity of a single neuron is not sufficient to predict its evolution of activity (note that the correlation between perparatory and perimovement activity assesses only one particular way of predicting perimovement from preparatory activity – scaling), but the evolution of activity of all neurons can be used to determine the preparatory activity of a single neuron under the assumption that the evolution of activity is governed by approximately linear dynamics. If the dynamics is linear, then any state in the future is a linear transformation of the initial state and given enough data points from the future the initial state can be determined by an appropriate linear inversion. The additional PCA, also a linear transformation, doesn’t change that, but makes the regression easier and, important for the noisy data, also regularises.

These findings and suggestions are all quite interesting and certainly fit into our preconceptions about neuronal activity, but are the presented results really surprising? Do people still believe that you can make sense of the activity of isolated neurons in cortex, or isn’t it already accepted that population dynamics is necessary to characterise neuronal responses? For example, Pillow et al. (Pillow2008) used coupled spiking models to successfully predict spike trains directly from stimuli in retinal ganglion cells. On the other hand, Churchland et al. indirectly claim in this paper that the population dynamics is (approximately) linear, which is certainly disputable, but what would nonlinear dynamics mean for their analysis?

Encoding of Motor Skill in the Corticomuscular System of Musicians.

Gentner, R., Gorges, S., Weise, D., aufm Kampe, K., Buttmann, M., and Classen, J.
Current Biology, 20:1869-1874
 , 2010
DOI, Google Scholar


Summary How motor skills are stored in the nervous system represents a fundamental question in neuroscience. Although musical motor skills are associated with a variety of adaptations [[1], [2] and [3]], it remains unclear how these changes are linked to the known superior motor performance of expert musicians. Here we establish a direct and specific relationship between the functional organization of the corticomuscular system and skilled musical performance. Principal component analysis was used to identify joint correlation patterns in finger movements evoked by transcranial magnetic stimulation over the primary motor cortex while subjects were at rest. Linear combinations of a selected subset of these patterns were used to reconstruct active instrumental playing or grasping movements. Reconstruction quality of instrumental playing was superior in skilled musicians compared to musically untrained subjects, displayed taxonomic specificity for the trained movement repertoire, and correlated with the cumulated long-term training exposure, but not with the recent past training history. In violinists, the reconstruction quality of grasping movements correlated negatively with the long-term training history of violin playing. Our results indicate that experience-dependent motor skills are specifically encoded in the functional organization of the primary motor cortex and its efferent system and are consistent with a model of skill coding by a modular neuronal architecture [4].


The authors use PCA on TMS induced postures to show that motor cortex represents building blocks of movements which adapt to everyday requirements. To be precise, the authors recorded finger movements which were induced by TMS over primary motor cortex and extracted for each of the different stimulations the posture which had the largest deviation from rest. From the resulting set of postures they computed the first 4 principal components (PCs) and looked how well a linear combination of the PCs could reconstruct postures recorded during normal behaviour of the subjects. This is made more interesting by comparing groups of subjects with different motor experience. They use highly trained violinists and pianists and a group of non-musicians and then compare the different combinations of who is used for determining PCs and what is trying to be reconstructed (violin playing, piano playing, or grasping where grasping can be that of violinists or non-musicians). Basis of comparison is a correlation (R) between the series of joint angle vectors as defined in Shadmehr1994 which can be interpreted as something like the average correlation between data points of the two sequences measured across joint angles (cf. normalised inner product matrix in GPLVM). Don’t ask me why they take exactly this measure, but probably it doesn’t matter. The main finding is that the PCs from violinists are significantly better in reconstructing violin playing than either the piano PCs, or the non-musician PCs. This table is missing in the text (but the data is there, showing mean R and its standard deviation):

R violinists pianists non-musicians

violin 0.69+0.09 0.63+0.11 0.64+0.09

piano 0.70+0.06 0.74+0.06 0.70+0.07

grasp 0.76+0.09 0.76+0.09 0.76+0.10

what is not discussed in the paper is that pianists’ PCs are worse in reconstructing violin playing than PCs of non-musicians. An interesting finding is that the years of intensive training of violinists correlates significantly with the reconstruction quality for violin playing of violinist PCs while it is anticorrelated with the reconstruction quality for grasping indicating that the postures activated in primary motor cortex become more adapted to frequently executed tasks. However, it has to be noted that this correlation analysis is based on only 9 data points.

In the beginning of the paper they show an analysis of the recorded behaviour which simply is supposed to ensure that violin playing, piano playing and grasping movements are sufficiently different which we may believe, although piano playing and grasping apparently are somewhat similar.