gpflow.kernels#
Kernel
s form a core component of GPflow models and allow prior information to
be encoded about a latent function of interest.
For an introduction to kernels, see Kernels
in our Getting Started guide. The effect of choosing
different kernels, and how it is possible to combine multiple kernels is shown
in the “Using kernels in GPflow” notebook.
Broadcasting over leading dimensions: kernel.K(X1, X2) returns the kernel evaluated on every pair in X1 and X2. E.g. if X1 has shape [S1, N1, D] and X2 has shape [S2, N2, D], kernel.K(X1, X2) will return a tensor of shape [S1, N1, S2, N2]. Similarly, kernel.K(X1, X1) returns a tensor of shape [S1, N1, S1, N1]. In contrast, the return shape of kernel.K(X1) is [S1, N1, N1]. (Without leading dimensions, the behaviour of kernel.K(X, None) is identical to kernel.K(X, X).)
Modules#
Classes#
gpflow.kernels.AnisotropicStationary#
- class gpflow.kernels.AnisotropicStationary(variance=1.0, lengthscales=1.0, **kwargs)[source]#
Bases:
Stationary
Base class for anisotropic stationary kernels, i.e. kernels that only depend on
d = x - x’
Derived classes should implement K_d(self, d): Returns the kernel evaluated on d, which is the pairwise difference matrix, scaled by the lengthscale parameter ℓ (i.e. [(X - X2ᵀ) / ℓ]). The last axis corresponds to the input dimension.
- Parameters:
- scaled_difference_matrix(X, X2=None)[source]#
Returns [(X - X2ᵀ) / ℓ]. If X has shape […, N, D] and X2 has shape […, M, D], the output will have shape […, N, M, D].
- Parameters:
- Return type:
Tensor
- Returns:
return has shape [batch…, N, N, D] if X2 is None.
return has shape [batch…, N, batch2…, N2, D] if X2 is not None.
gpflow.kernels.ArcCosine#
- class gpflow.kernels.ArcCosine(order=0, variance=1.0, weight_variances=1.0, bias_variance=1.0, *, active_dims=None, name=None)[source]#
Bases:
Kernel
The Arc-cosine family of kernels which mimics the computation in neural networks. The order parameter specifies the assumed activation function. The Multi Layer Perceptron (MLP) kernel is closely related to the ArcCosine kernel of order 0.
The key reference is Cho and Saul [CS09].
- Parameters:
order (
int
) –variance (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –weight_variances (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –bias_variance (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –active_dims (
Union
[slice
,Sequence
[int
],None
]) –name (
Optional
[str
]) –
- property ard: bool#
Whether ARD behaviour is active.
gpflow.kernels.Bias#
gpflow.kernels.ChangePoints#
- class gpflow.kernels.ChangePoints(kernels, locations, steepness=1.0, name=None)[source]#
Bases:
Combination
The ChangePoints kernel defines a fixed number of change-points along a 1d input space where different kernels govern different parts of the space.
The kernel is by multiplication and addition of the base kernels with sigmoid functions (σ). A single change-point kernel is defined as:
K₁(x, x') * (1 - σ(x)) * (1 - σ(x')) + K₂(x, x') * σ(x) * σ(x')
where K₁ is deactivated around the change-point and K₂ is activated. The single change-point version can be found in Lloyd [Llo14]. Each sigmoid is a logistic function defined as:
σ(x) = 1 / (1 + exp{-s(x - x₀)})
parameterized by location “x₀” and steepness “s”.
The key reference is Lloyd [Llo14].
gpflow.kernels.Combination#
- class gpflow.kernels.Combination(kernels, name=None)[source]#
Bases:
Kernel
Combine a list of kernels, e.g. by adding or multiplying (see inheriting classes).
Note that kernel composition can be done easily by using the + and * operators defined in
Kernel
.The names of the kernels to be combined are generated from their class names.
- Parameters:
kernels (
Sequence
[Kernel
]) –name (
Optional
[str
]) –
- property on_separate_dimensions: bool#
Checks whether the kernels in the combination act on disjoint subsets of dimensions. Currently, it is hard to asses whether two slice objects will overlap, so this will always return False.
- Returns:
Boolean indicator.
gpflow.kernels.Constant#
- class gpflow.kernels.Constant(variance=1.0, active_dims=None)[source]#
Bases:
Static
The Constant (aka Bias) kernel. Functions drawn from a GP with this kernel are constant, i.e. f(x) = c, with c ~ N(0, σ^2). The kernel equation is
k(x, y) = σ²
where: σ² is the variance parameter.
- Parameters:
variance (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –active_dims (
Union
[slice
,Sequence
[int
],None
]) –
gpflow.kernels.Convolutional#
- class gpflow.kernels.Convolutional(base_kernel, image_shape, patch_shape, weights=None, colour_channels=1)[source]#
Bases:
Kernel
Plain convolutional kernel as described in van der Wilk et al. [vdWRH17]. Defines a GP \(f()\) that is constructed from a sum of responses of individual patches in an image:
\[f(x) = \sum_p x^{[p]}\]where \(x^{[p]}\) is the \(p\)’th patch in the image.
The key reference is van der Wilk et al. [vdWRH17].
- Parameters:
- get_patches(X)[source]#
Extracts patches from the images X. Patches are extracted separately for each of the colour channels.
- Parameters:
X (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –X has shape [batch…, N, D].
Images.
- Return type:
Tensor
- Returns:
return has shape [batch…, N, P, S].
Patches.
gpflow.kernels.Coregion#
- class gpflow.kernels.Coregion(output_dim, rank, *, active_dims=None, name=None)[source]#
Bases:
Kernel
A Coregionalization kernel. The inputs to this kernel are _integers_ (we cast them from floats as needed) which usually specify the outputs of a Coregionalization model.
The kernel function is an indexing of a positive-definite matrix:
K(x, y) = B[x, y] .
To ensure that B is positive-definite, it is specified by the two parameters of this kernel, W and kappa:
B = W Wᵀ + diag(kappa) .
We refer to the size of B as “output_dim x output_dim”, since this is the number of outputs in a coregionalization model. We refer to the number of columns on W as ‘rank’: it is the number of degrees of correlation between the outputs.
NB. There is a symmetry between the elements of W, which creates a local minimum at W=0. To avoid this, it is recommended to initialize the optimization (or MCMC chain) using a random W.
- Parameters:
output_dim (
int
) –rank (
int
) –active_dims (
Union
[slice
,Sequence
[int
],None
]) –name (
Optional
[str
]) –
gpflow.kernels.Cosine#
- class gpflow.kernels.Cosine(variance=1.0, lengthscales=1.0, **kwargs)[source]#
Bases:
AnisotropicStationary
The Cosine kernel. Functions drawn from a GP with this kernel are sinusoids (with a random phase). The kernel equation is
k(r) = σ² cos{2πd}
where: d is the sum of the per-dimension differences between the input points, scaled by the lengthscale parameter ℓ (i.e. Σᵢ [(X - X2ᵀ) / ℓ]ᵢ), σ² is the variance parameter.
gpflow.kernels.Exponential#
- class gpflow.kernels.Exponential(variance=1.0, lengthscales=1.0, **kwargs)[source]#
Bases:
IsotropicStationary
The Exponential kernel. It is equivalent to a Matern12 kernel with doubled lengthscales.
gpflow.kernels.IndependentLatent#
- class gpflow.kernels.IndependentLatent(active_dims=None, name=None)[source]#
Bases:
MultioutputKernel
Base class for multioutput kernels that are constructed from independent latent Gaussian processes.
It should always be possible to specify inducing variables for such kernels that give a block-diagonal Kuu, which can be represented as a [L, M, M] tensor. A reasonable (but not optimal) inference procedure can be specified by placing the inducing points in the latent processes and simply computing Kuu [L, M, M] and Kuf [N, P, M, L] and using fallback_independent_latent_ conditional(). This can be specified by using Fallback{Separate|Shared} IndependentInducingVariables.
- Parameters:
active_dims (
Union
[slice
,Sequence
[int
],None
]) –name (
Optional
[str
]) –
gpflow.kernels.IsotropicStationary#
- class gpflow.kernels.IsotropicStationary(variance=1.0, lengthscales=1.0, **kwargs)[source]#
Bases:
Stationary
Base class for isotropic stationary kernels, i.e. kernels that only depend on
r = ‖x - x’‖
Derived classes should implement one of:
K_r2(self, r2): Returns the kernel evaluated on r² (r2), which is the squared scaled Euclidean distance Should operate element-wise on r2.
K_r(self, r): Returns the kernel evaluated on r, which is the scaled Euclidean distance. Should operate element-wise on r.
- Parameters:
gpflow.kernels.Kernel#
- class gpflow.kernels.Kernel(active_dims=None, name=None)[source]#
Bases:
Module
The basic kernel class. Management of active dimensions is implemented here.
- Parameters:
active_dims (
Union
[slice
,Sequence
[int
],None
]) – active dimensions, either a slice or list of indices into the columns of X.name (
Optional
[str
]) – optional kernel name.
- on_separate_dims(other)[source]#
Checks if the dimensions, over which the kernels are specified, overlap. Returns True if they are defined on different/separate dimensions and False otherwise.
- Parameters:
other (
Kernel
) –- Return type:
bool
- slice(X, X2=None)[source]#
Slice the correct dimensions for use in the kernel, as indicated by self.active_dims.
- Parameters:
- Return type:
Tuple
[Tensor
,Optional
[Tensor
]]- Returns:
return[0] has shape [batch…, N, I].
return[1] has shape [batch2…, N2, I].
Sliced X, X2.
- slice_cov(cov)[source]#
Slice the correct dimensions for use in the kernel, as indicated by self.active_dims for covariance matrices. This requires slicing the rows and columns. This will also turn flattened diagonal matrices into a tensor of full diagonal matrices.
- Parameters:
cov (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –cov has shape [N, D_or_DD…].
Tensor of covariance matrices.
- Return type:
Tensor
- Returns:
return has shape [N, I, I].
Sliced covariance matrices.
gpflow.kernels.Linear#
- class gpflow.kernels.Linear(variance=1.0, active_dims=None)[source]#
Bases:
Kernel
The linear kernel. Functions drawn from a GP with this kernel are linear, i.e. f(x) = cx. The kernel equation is
k(x, y) = σ²xy
where σ² is the variance parameter.
- Parameters:
variance (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –active_dims (
Union
[slice
,Sequence
[int
],None
]) –
- property ard: bool#
Whether ARD behaviour is active.
gpflow.kernels.LinearCoregionalization#
- class gpflow.kernels.LinearCoregionalization(kernels, W, name=None)[source]#
Bases:
IndependentLatent
,Combination
Linear mixing of the latent GPs to form the output.
- Parameters:
- K(X, X2=None, full_output_cov=True)[source]#
Returns the correlation of f(X) and f(X2), where f(.) can be multi-dimensional.
- Parameters:
- Return type:
Tensor
- Returns:
return has shape [P, batch…, N, N] if (not full_output_cov) and (X2 is None).
return has shape [P, batch…, N, batch2…, N2] if (not full_output_cov) and (X2 is not None).
return has shape [batch…, N, P, N, P] if full_output_cov and (X2 is None).
return has shape [batch…, N, P, batch2…, N2, P] if full_output_cov and (X2 is not None).
cov[f(X), f(X2)]
- K_diag(X, full_output_cov=True)[source]#
Returns the correlation of f(X) and f(X), where f(.) can be multi-dimensional.
- Parameters:
X (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –X has shape [batch…, N, D].
data matrix
full_output_cov (
bool
) – calculate correlation between outputs.
- Return type:
Tensor
- Returns:
return has shape [batch…, N, P, P] if full_output_cov.
return has shape [batch…, N, P] if not full_output_cov.
var[f(X)]
- property num_latent_gps: int#
The number of latent GPs in the multioutput kernel
gpflow.kernels.Matern12#
- class gpflow.kernels.Matern12(variance=1.0, lengthscales=1.0, **kwargs)[source]#
Bases:
IsotropicStationary
The Matern 1/2 kernel. Functions drawn from a GP with this kernel are not differentiable anywhere. The kernel equation is
k(r) = σ² exp{-r}
where: r is the Euclidean distance between the input points, scaled by the lengthscales parameter ℓ. σ² is the variance parameter
gpflow.kernels.Matern32#
- class gpflow.kernels.Matern32(variance=1.0, lengthscales=1.0, **kwargs)[source]#
Bases:
IsotropicStationary
The Matern 3/2 kernel. Functions drawn from a GP with this kernel are once differentiable. The kernel equation is
k(r) = σ² (1 + √3r) exp{-√3 r}
where: r is the Euclidean distance between the input points, scaled by the lengthscales parameter ℓ, σ² is the variance parameter.
gpflow.kernels.Matern52#
- class gpflow.kernels.Matern52(variance=1.0, lengthscales=1.0, **kwargs)[source]#
Bases:
IsotropicStationary
The Matern 5/2 kernel. Functions drawn from a GP with this kernel are twice differentiable. The kernel equation is
k(r) = σ² (1 + √5r + 5/3r²) exp{-√5 r}
where: r is the Euclidean distance between the input points, scaled by the lengthscales parameter ℓ, σ² is the variance parameter.
gpflow.kernels.MultioutputKernel#
- class gpflow.kernels.MultioutputKernel(active_dims=None, name=None)[source]#
Bases:
Kernel
Multi Output Kernel class.
This kernel can represent correlation between outputs of different datapoints.
The full_output_cov argument holds whether the kernel should calculate the covariance between the outputs. In case there is no correlation but full_output_cov is set to True the covariance matrix will be filled with zeros until the appropriate size is reached.
- Parameters:
active_dims (
Union
[slice
,Sequence
[int
],None
]) –name (
Optional
[str
]) –
- abstract K(X, X2=None, full_output_cov=True)[source]#
Returns the correlation of f(X) and f(X2), where f(.) can be multi-dimensional.
- Parameters:
- Return type:
Tensor
- Returns:
return has shape [P, batch…, N, N] if (not full_output_cov) and (X2 is None).
return has shape [P, batch…, N, batch2…, N2] if (not full_output_cov) and (X2 is not None).
return has shape [batch…, N, P, N, P] if full_output_cov and (X2 is None).
return has shape [batch…, N, P, batch2…, N2, P] if full_output_cov and (X2 is not None).
cov[f(X), f(X2)]
- abstract K_diag(X, full_output_cov=True)[source]#
Returns the correlation of f(X) and f(X), where f(.) can be multi-dimensional.
- Parameters:
X (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –X has shape [batch…, N, D].
data matrix
full_output_cov (
bool
) – calculate correlation between outputs.
- Return type:
Tensor
- Returns:
return has shape [batch…, N, P, P] if full_output_cov.
return has shape [batch…, N, P] if not full_output_cov.
var[f(X)]
- abstract property latent_kernels: Tuple[Kernel, ...]#
The underlying kernels in the multioutput kernel
- abstract property num_latent_gps: int#
The number of latent GPs in the multioutput kernel
gpflow.kernels.Periodic#
- class gpflow.kernels.Periodic(base_kernel, period=1.0)[source]#
Bases:
Kernel
The periodic family of kernels. Can be used to wrap any Stationary kernel to transform it into a periodic version. The canonical form (based on the SquaredExponential kernel) can be found in Equation (47) of
D.J.C.MacKay. Introduction to Gaussian processes. In C.M.Bishop, editor, Neural Networks and Machine Learning, pages 133–165. Springer, 1998.
The derivation can be achieved by mapping the original inputs through the transformation u = (cos(x), sin(x)).
For the SquaredExponential base kernel, the result can be expressed as:
k(r) = σ² exp{ -0.5 sin²(π r / γ) / ℓ²}
where: r is the Euclidean distance between the input points ℓ is the lengthscales parameter, σ² is the variance parameter, γ is the period parameter.
- NOTE: usually we have a factor of 4 instead of 0.5 in front but this
is absorbed into the lengthscales hyperparameter.
- NOTE: periodic kernel uses active_dims of a base kernel, therefore
the constructor doesn’t have it as an argument.
- Parameters:
base_kernel (
IsotropicStationary
) –period (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –
gpflow.kernels.Polynomial#
- class gpflow.kernels.Polynomial(degree=3.0, variance=1.0, offset=1.0, active_dims=None)[source]#
Bases:
Linear
The Polynomial kernel. Functions drawn from a GP with this kernel are polynomials of degree d. The kernel equation is
k(x, y) = (σ²xy + γ)ᵈ
where: σ² is the variance parameter, γ is the offset parameter, d is the degree parameter.
gpflow.kernels.Product#
- class gpflow.kernels.Product(kernels, name=None)[source]#
Bases:
ReducingCombination
- Parameters:
kernels (
Sequence
[Kernel
]) –name (
Optional
[str
]) –
gpflow.kernels.RBF#
- gpflow.kernels.RBF#
alias of
SquaredExponential
gpflow.kernels.RationalQuadratic#
- class gpflow.kernels.RationalQuadratic(variance=1.0, lengthscales=1.0, alpha=1.0, active_dims=None)[source]#
Bases:
IsotropicStationary
Rational Quadratic kernel,
k(r) = σ² (1 + r² / 2αℓ²)^(-α)
σ² : variance ℓ : lengthscales α : alpha, determines relative weighting of small-scale and large-scale fluctuations
For α → ∞, the RQ kernel becomes equivalent to the squared exponential.
gpflow.kernels.SeparateIndependent#
- class gpflow.kernels.SeparateIndependent(kernels, name=None)[source]#
Bases:
MultioutputKernel
,Combination
Separate: we use different kernel for each output latent
Independent: Latents are uncorrelated a priori.
- Parameters:
kernels (
Sequence
[Kernel
]) –name (
Optional
[str
]) –
- K(X, X2=None, full_output_cov=True)[source]#
Returns the correlation of f(X) and f(X2), where f(.) can be multi-dimensional.
- Parameters:
- Return type:
Tensor
- Returns:
return has shape [P, batch…, N, N] if (not full_output_cov) and (X2 is None).
return has shape [P, batch…, N, batch2…, N2] if (not full_output_cov) and (X2 is not None).
return has shape [batch…, N, P, N, P] if full_output_cov and (X2 is None).
return has shape [batch…, N, P, batch2…, N2, P] if full_output_cov and (X2 is not None).
cov[f(X), f(X2)]
- K_diag(X, full_output_cov=False)[source]#
Returns the correlation of f(X) and f(X), where f(.) can be multi-dimensional.
- Parameters:
X (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –X has shape [batch…, N, D].
data matrix
full_output_cov (
bool
) – calculate correlation between outputs.
- Return type:
Tensor
- Returns:
return has shape [batch…, N, P, P] if full_output_cov.
return has shape [batch…, N, P] if not full_output_cov.
var[f(X)]
- property num_latent_gps: int#
The number of latent GPs in the multioutput kernel
gpflow.kernels.SquaredExponential#
- class gpflow.kernels.SquaredExponential(variance=1.0, lengthscales=1.0, **kwargs)[source]#
Bases:
IsotropicStationary
The radial basis function (RBF) or squared exponential kernel. The kernel equation is
k(r) = σ² exp{-½ r²}
where: r is the Euclidean distance between the input points, scaled by the lengthscales parameter ℓ. σ² is the variance parameter
Functions drawn from a GP with this kernel are infinitely differentiable!
gpflow.kernels.Static#
- class gpflow.kernels.Static(variance=1.0, active_dims=None)[source]#
Bases:
Kernel
Kernels who don’t depend on the value of the inputs are ‘Static’. The only parameter is a variance, σ².
- Parameters:
variance (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –active_dims (
Union
[slice
,Sequence
[int
],None
]) –
gpflow.kernels.Stationary#
- class gpflow.kernels.Stationary(variance=1.0, lengthscales=1.0, **kwargs)[source]#
Bases:
Kernel
Base class for kernels that are stationary, that is, they only depend on
d = x - x’
This class handles ‘ard’ behaviour, which stands for ‘Automatic Relevance Determination’. This means that the kernel has one lengthscale per dimension, otherwise the kernel is isotropic (has a single lengthscale).
- Parameters:
- property ard: bool#
Whether ARD behaviour is active.
gpflow.kernels.Sum#
- class gpflow.kernels.Sum(kernels, name=None)[source]#
Bases:
ReducingCombination
- Parameters:
kernels (
Sequence
[Kernel
]) –name (
Optional
[str
]) –
gpflow.kernels.White#
- class gpflow.kernels.White(variance=1.0, active_dims=None)[source]#
Bases:
Static
The White kernel: this kernel produces ‘white noise’. The kernel equation is
k(x_n, x_m) = δ(n, m) σ²
where: δ(.,.) is the Kronecker delta, σ² is the variance parameter.
- Parameters:
variance (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –active_dims (
Union
[slice
,Sequence
[int
],None
]) –