Visual cortex is traditionally viewed as a hierarchy of neural feature detectors, with neural population responses being driven by bottom-up stimulus features. Conversely, “predictive coding” models propose that each stage of the visual hierarchy harbors two computationally distinct classes of processing unit: representational units that encode the conditional probability of a stimulus and provide predictions to the next lower level; and error units that encode the mismatch between predictions and bottom-up evidence, and forward prediction error to the next higher level. Predictive coding therefore suggests that neural population responses in category-selective visual regions, like the fusiform face area (FFA), reflect a summation of activity related to prediction (“face expectation”) and prediction error (“face surprise”), rather than a homogenous feature detection response. We tested the rival hypotheses of the feature detection and predictive coding models by collecting functional magnetic resonance imaging data from the FFA while independently varying both stimulus features (faces vs houses) and subjects’ perceptual expectations regarding those features (low vs medium vs high face expectation). The effects of stimulus and expectation factors interacted, whereby FFA activity elicited by face and house stimuli was indistinguishable under high face expectation and maximally differentiated under low face expectation. Using computational modeling, we show that these data can be explained by predictive coding but not by feature detection models, even when the latter are augmented with attentional mechanisms. Thus, population responses in the ventral visual stream appear to be determined by feature expectation and surprise rather than by stimulus features per se.
In general the design of the study is interesting as it is a fMRI study investigating the effects of a stimulus that is presented immediately before the actually analysed stimulus, i.e. temporal dependencies between sequentially presented stimuli of which predictability is a subset (actually priming studies would also fall into this category, don’t know how well they are studied with fMRI).
While the original predictive coding and feature detection models are convincing, the feature detection + attention models are confusing. All models seem to lack a baseline. The attention models are somehow defined on the “differential FFA response” and this is not further explained. The f b_1 part of the attention models can actually be reduced to b_1.
Katharina noted that, in contrast to here where they didn’t do it, you should do a small sample correction, if you want to do the ROI analysis properly.
They do not differentiate between prediction error and surprise in the paper. Surprise is the precision-weighted prediction error.