Gershman, S. J. and Wilson, R. C.
in: Advances in Neural Information Processing Systems 23, 2010
Optimal control entails combining probabilities and utilities. However, for most practical problems, probability densities can be represented only approximately. Choosing an approximation requires balancing the benefits of an accurate approximation against the costs of computing it. We propose a variational framework for achieving this balance and apply it to the problem of how a neural population code should optimally represent a distribution under resource constraints. The essence of our analysis is the conjecture that population codes are organized to maximize a lower bound on the log expected utility. This theory can account for a plethora of experimental data, including the reward-modulation of sensory receptive fields, GABAergic effects on saccadic movements, and risk aversion in decisions under uncertainty.
Within the area of decision theory they consider the problem of evaluating the expected utility of an action given a posterior. They propose a variational framework analogously to the one used for the EM algorithm in which the utility replaces the likelihood and the posterior replaces the prior. Their main contribution is to include a cost penalising the complexity of the approximation of the posterior and use the so defined lower bound on the expected utility to simultaneously optimise the density used to approximate the posterior. As the utility does not only contain the cost of the approximation, but also the actual utility of an action in the considered states, this model predicts that the approximating density should also reflect what is behaviourally relevant instead of only trying to represent the natural posterior distribution optimally.
In the results section they then show that under this model the approximated posterior can (and will) indeed put more probability mass on states with larger utility, something which has apparently been found in grasshoppers. Additionally they show that increasing the cost of spikes results in smaller firing rates which, as they argue, leads to response latencies as seen in experiments. Finally, they show that under the assumption that high utility or very costly states are rare, the model will automatically account for the nonlinear weighting of probabilities in risky choice observed in humans. The model therefore explains this irrational behaviour by noting that “under neural resource constraints, the approximate density will be biased towards high reward regions of the state space.”
I don’t know enough to judge the correspondence of the model behaviour/predictions with the experimental results, or whether the model contradicts some other results. However, the paper is quite inspiring in the sense that it presents an intuitive idea which potentially has big implications for how populations of neurons code probability distributions, namely that the neuronal codes are influenced as much by expected rewards as by the natural distribution. Of course, the paper leaves many questions open. The authors only show results for when the approximate distributions are optimised alone, but what happens when actions and distribution are optimised simultaneously? What are the timescales of the distribution optimisation? Is it really instantanious (on the same timescale as action selection) as the authors indicate, or is it rather a slower process? Their proposal also has the potential to explain how the dimensionality of the state space can be reduced by only considering states which are behaviourally relevant. However, it remains unclear to what extent this specialisation should be implemented. In other words, is the posterior dependent, e.g., on the precise goal in the task, or is it rather only dependent on the selected task? Especially the cost of spiking example suggests a connection between the proposed mechanism and attention. Can attention be explained by this low-level description of biased representations of the posterior distribution?
The paper is quite inspiring and you kind of wonder why nobody has made these ideas explicit before, or maybe somebody has? Actually Maneesh Sahani had a NIPS paper in 2004 which they cite, but not comment on, and which looks very similar to what they do here (from the abstract).