Sparse neural codes have been widely observed in cortical sensory and motor areas. A striking example of sparse temporal coding is in the song-related premotor area high vocal center (HVC) of songbirds: The motor neurons innervating avian vocal muscles are driven by premotor nucleus robustus archistriatalis (RA), which is in turn driven by nucleus HVC. Recent experiments reveal that RA-projecting HVC neurons fire just one burst per song motif. However, the function of this remarkable temporal sparseness has remained unclear. Because birdsong is a clear example of a learned complex motor behavior, we explore in a neural network model with the help of numerical and analytical techniques the possible role of sparse premotor neural codes in song-related motor learning. In numerical simulations with nonlinear neurons, as HVC activity is made progressively less sparse, the minimum learning time increases significantly. Heuristically, this slowdown arises from increasing interference in the weight updates for different synapses. If activity in HVC is sparse, synaptic interference is reduced, and is minimized if each synapse from HVC to RA is used only once in the motif, which is the situation observed experimentally. Our numerical results are corroborated by a theoretical analysis of learning in linear networks, for which we derive a relationship between sparse activity, synaptic interference, and learning time. If songbirds acquire their songs under significant pressure to learn quickly, this study predicts that HVC activity, currently measured only in adults, should also be sparse during the sensorimotor phase in the juvenile bird. We discuss the relevance of these results, linking sparse codes and learning speed, to other multilayered sensory and motor systems.
They model the generation of bird song as a simple feed-forward network and show that a sparse temporal code of HVC neurons (feeding into RA neurons) speeds up learning with backpropagation. They argue that this speed up is the main explanation for why real HVC neurons exhibit a sparse temporal code.
HVC neurons are modelled as either on or off, i.e., bursting or non-bursting, while RA neurons have continuous activities. A linear combination of RA neurons then determines the output of the network. They define a desired, low-pass filtered output that should be learnt, but while their Fig. 2 suggests that they model the sequential aspect of the data, the actual network has no such component and the temporal order of the data points is irrelevant for learning. Maybe fixing, i.e., not learning, the connections from RA to output is biologically well motivated, but other choices for the network seem to be quite arbitrary, e.g., why do RA neurons project from the beginning to only one of two outputs? They varied quite a few parameters and found that their main result (learning is faster with sparse HVC firing) holds for all of them, though. Interesting to note: they had to initialise HVC-RA and RA thresholds such that initial RA activity is low and nonuniform in order to get desired type of RA activity after learning.
I didn’t like the paper that much, because they showed the benefit of sparse coding for the biologically implausible backpropagation learning. Would it also hold up against a Hebbian learning paradigm? On the other hand, the whole idea of being able to learn better when each neuron is only responsible for one restricted part of the stimulus is so outrageously intuitive that you wonder why this needed to be shown in the first place (Stefan noted, though, that he doesn’t know of work investigating temporal sparseness compared to spatial sparseness)? Finally, you cannot argue that this is the main reason why HVC neurons fire in a temporally sparse manner, because there might be other unknown reasons and this is only a side effect.