Derivation of SGPR equations#
James Hensman, March 2016. Corrections by Alex Matthews, December 2016
This notebook contains a derivation of the form of the equations for the marginal likelihood bound and predictions for the sparse Gaussian process regression model in GPflow, gpflow.models.SGPR
.
The primary reference for this work is Titsias 2009 [1], though other works (Hensman et al. 2013 [2], Matthews et al. 2016 [3]) are useful for clarifying the prediction density.
Marginal likelihood bound#
The bound on the marginal likelihood (Titsias 2009) is:
The kernel matrices
To obtain an efficient and stable evaluation on the bound
Now, to obtain a better conditioned matrix for inversion, we rotate by
This matrix is better conditioned because, for many kernels, it has eigenvalues bounded above and below. For more details, see section 3.4.3 of Gaussian Processes for Machine Learning.
For notational convenience, we’ll define
We also apply the matrix determinant lemma to the same:
:nbsphinx-math:`begin{equation} |{\mathbf Q_{ff}} + \sigma^2 {\mathbf I}| = |{mathbf K_{uu}} +
mathbf K_{uf}mathbf K_{fu}sigma^{-2}| , |\mathbf K_{uu}^{-1}| , |\sigma^{2}\mathbf I|
end{equation}`
Substituting
mathbf K_{uf}mathbf K_{fu}sigma^{-2}| , |\mathbf L^{-\top}|,| mathbf L^{-1}| , |\sigma^{2}\mathbf I|
end{equation}`
:nbsphinx-math:`begin{equation} |{\mathbf Q_{ff}} + \sigma^2 {\mathbf I}| = |mathbf I +
mathbf L^{-1}mathbf K_{uf}mathbf K_{fu} mathbf L^{-top}sigma^{-2}| , |\sigma^{2}\mathbf I|
end{equation}`
With these two definitions, we’re ready to expand the bound:
where
Finally, we define
The SGPR
code implements this equation with small changes for multiple concurrent outputs (columns of the data matrix Y), and also a prior mean function.
Prediction#
At prediction time, we need to compute the mean and variance of the variational approximation at some new points
Following Hensman et al. (2013), we know that all the information in the posterior approximation is contained in the Gaussian distribution
with:
To make a prediction, we need to integrate:
with:
The integral results in:
Note from our above definitions we have:
and further:
substituting:
The code in SGPR
implements this equation, with an additional switch depending on whether the full covariance matrix is required.
References#
[1] Titsias, M: Variational Learning of Inducing Variables in Sparse Gaussian Processes, PMLR 5:567-574, 2009
[2] Hensman et al: Gaussian Processes for Big Data, UAI, 2013
[3] Matthews et al: On Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes, AISTATS, 2016