Friston, K. J.
Neuroimage, 16:513–530, 2002
DOI, Google Scholar
Abstract
This paper presents a method for estimating the conditional or posterior distribution of the parameters of deterministic dynamical systems. The procedure conforms to an EM implementation of a Gauss-Newton search for the maximum of the conditional or posterior density. The inclusion of priors in the estimation procedure ensures robust and rapid convergence and the resulting conditional densities enable Bayesian inference about the model parameters. The method is demonstrated using an input-state-output model of the hemodynamic coupling between experimentally designed causes or factors in fMRI studies and the ensuing BOLD response. This example represents a generalization of current fMRI analysis models that accommodates nonlinearities and in which the parameters have an explicit physical interpretation. Second, the approach extends classical inference, based on the likelihood of the data given a null hypothesis about the parameters, to more plausible inferences about the parameters of the model given the data. This inference provides for confidence intervals based on the conditional density.
Review
I presented the algorithm which underlies various forms of dynamic causal modeling and which we use to estimate RNN parameters. At the core of it is an iterative computation of the posterior of the parameters of a dynamical model based on a first-order Taylor series approximation of a meta-function mapping parameter values to observations, i.e., the dynamical system is hidden in this function such that the probabilistic model does not have to care about it. This is possible, because the dynamics is assumed to be deterministic and noise only contributes at the level of observations. It can be shown that the resulting update equations for the posterior mode are equivalent with a Gauss-Newton optimisation of the log-joint probability of observations and parameters (this is MAP estimation of the parameters). Consequently, the rate of convergence of the posterior may be up to quadratic, but it is not guaranteed to increase the likelihood at every step or actually converge at all. It should work well close to an optimum (when observations are well fitted), or if the dynamics is close to linear with respect to parameters. Because the dynamical system is integrated numerically to get observation predictions and the Jacobian of the observations with respect to parameters is also obtained numerically, this algorithm may be very slow.
This algorithm is described in Friston2002 embedded into an application to fMRI. I did not present the specifics of this application and, particularly, ignored the influence of the there defined inputs u. The derivation of the parameter posterior described above is embedded in an EM algorithm for hyperparameters on the covariance of observations. I will discuss this in a future session.