gpflow.likelihoods#
Likelihoods are another core component of GPflow. This describes how likely the data is under the assumptions made about the underlying latent functions p(Y|F). Different likelihoods make different assumptions about the distribution of the data, as such different data-types (continuous, binary, ordinal, count) are better modelled with different likelihood assumptions.
Use of any likelihood other than Gaussian typically introduces the need to use an approximation to perform inference, if one isn’t already needed. Variational inference and MCMC models are included in GPflow and allow approximate inference with non-Gaussian likelihoods. An introduction to these models can be found here. Specific notebooks illustrating non-Gaussian likelihood regressions are available for classification (binary data), ordinal and multiclass.
Creating new likelihoods#
Likelihoods are defined by their
log-likelihood. When creating new likelihoods, the
logp
method (log p(Y|F)), the
conditional_mean
,
conditional_variance
.
In order to perform variational inference with non-Gaussian likelihoods a term
called variational expectations
, ∫ q(F) log p(Y|F) dF, needs to
be computed under a Gaussian distribution q(F) ~ N(μ, Σ).
The variational_expectations
method can be overriden if this can be computed in closed form, otherwise; if
the new likelihood inherits
Likelihood
the default will use
Gauss-Hermite numerical integration (works well when F is 1D
or 2D), if the new likelihood inherits from
MonteCarloLikelihood
the
integration is done by sampling (can be more suitable when F is higher dimensional).
Modules#
Classes#
gpflow.likelihoods.Bernoulli#
- class gpflow.likelihoods.Bernoulli(invlink=<function inv_probit>, **kwargs)[source]#
Bases:
ScalarLikelihood
- Parameters:
invlink (
Callable
[[Tensor
],Tensor
]) –kwargs (
Any
) –
gpflow.likelihoods.Beta#
- class gpflow.likelihoods.Beta(invlink=<function inv_probit>, scale=1.0, scale_lower_bound=1e-06, **kwargs)[source]#
Bases:
ScalarLikelihood
This uses a reparameterisation of the Beta density. We have the mean of the Beta distribution given by the transformed process:
m = invlink(f)
and a scale parameter. The familiar α, β parameters are given by
m = α / (α + β) scale = α + β
- so:
α = scale * m β = scale * (1-m)
gpflow.likelihoods.Exponential#
- class gpflow.likelihoods.Exponential(invlink=<function exp>, **kwargs)[source]#
Bases:
ScalarLikelihood
- Parameters:
invlink (
Callable
[[Tensor
],Tensor
]) –kwargs (
Any
) –
gpflow.likelihoods.Gamma#
- class gpflow.likelihoods.Gamma(invlink=<function exp>, shape=1.0, shape_lower_bound=1e-06, **kwargs)[source]#
Bases:
ScalarLikelihood
Use the transformed GP to give the scale (inverse rate) of the Gamma
gpflow.likelihoods.Gaussian#
- class gpflow.likelihoods.Gaussian(variance=None, *, scale=None, variance_lower_bound=1e-06, **kwargs)[source]#
Bases:
ScalarLikelihood
The Gaussian likelihood is appropriate where uncertainties associated with the data are believed to follow a normal distribution, with constant variance.
Very small uncertainties can lead to numerical instability during the optimization process. A lower bound of 1e-6 is therefore imposed on the likelihood variance by default.
gpflow.likelihoods.GaussianMC#
- class gpflow.likelihoods.GaussianMC(*args, **kwargs)[source]#
Bases:
MonteCarloLikelihood
,Gaussian
Stochastic version of Gaussian likelihood for demonstration purposes only.
- Parameters:
args (
Any
) –kwargs (
Any
) –
gpflow.likelihoods.HeteroskedasticTFPConditional#
- class gpflow.likelihoods.HeteroskedasticTFPConditional(distribution_class=<class 'tensorflow_probability.python.distributions.normal.Normal'>, scale_transform=None, **kwargs)[source]#
Bases:
MultiLatentTFPConditional
Heteroskedastic Likelihood where the conditional distribution is given by a TensorFlow Probability Distribution. The loc and scale of the distribution are given by a two-dimensional multi-output GP.
- Parameters:
distribution_class (
Type
[Distribution
]) –scale_transform (
Optional
[Bijector
]) –kwargs (
Any
) –
gpflow.likelihoods.Likelihood#
- class gpflow.likelihoods.Likelihood(input_dim, latent_dim, observation_dim)[source]#
Bases:
Module
,ABC
- Parameters:
input_dim (
Optional
[int
]) –latent_dim (
Optional
[int
]) –observation_dim (
Optional
[int
]) –
- conditional_mean(X, F)[source]#
The conditional mean of Y|X,F: [E[Y₁|X,F], …, E[Yₖ|X,F]] where K = observation_dim
- Parameters:
- Return type:
Tensor
- Returns:
return has shape [batch…, observation_dim].
mean
- conditional_variance(X, F)[source]#
The conditional marginal variance of Y|X,F: [var(Y₁|X,F), …, var(Yₖ|X,F)] where K = observation_dim
- Parameters:
- Return type:
Tensor
- Returns:
return has shape [batch…, observation_dim].
variance
- log_prob(X, F, Y)[source]#
The log probability density log p(Y|X,F)
- Parameters:
X (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –X has shape [broadcast batch…, input_dim].
input tensor
F (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –F has shape [broadcast batch…, latent_dim].
function evaluation tensor
Y (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –Y has shape [broadcast batch…, observation_dim].
observation tensor
- Return type:
Tensor
- Returns:
return has shape [batch…].
log pdf
- predict_log_density(X, Fmu, Fvar, Y)[source]#
Given a Normal distribution for the latent function, and a datum Y, compute the log predictive density of Y,
- i.e. if
q(F) = N(Fmu, Fvar)
and this object represents
p(y|F)
then this method computes the predictive density
log ∫ p(y=Y|F)q(F) df
- Parameters:
X (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –X has shape [broadcast batch…, input_dim].
input tensor
Fmu (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –Fmu has shape [broadcast batch…, latent_dim].
mean function evaluation tensor
Fvar (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –Fvar has shape [broadcast batch…, latent_dim].
variance of function evaluation tensor
Y (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –Y has shape [broadcast batch…, observation_dim].
observation tensor
- Return type:
Tensor
- Returns:
return has shape [batch…].
log predictive density
- predict_mean_and_var(X, Fmu, Fvar)[source]#
Given a Normal distribution for the latent function, return the mean and marginal variance of Y,
- i.e. if
q(f) = N(Fmu, Fvar)
and this object represents
p(y|f)
then this method computes the predictive mean
∫∫ y p(y|f)q(f) df dy
and the predictive variance
∫∫ y² p(y|f)q(f) df dy - [ ∫∫ y p(y|f)q(f) df dy ]²
- Parameters:
X (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –X has shape [broadcast batch…, input_dim].
input tensor
Fmu (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –Fmu has shape [broadcast batch…, latent_dim].
mean function evaluation tensor
Fvar (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –Fvar has shape [broadcast batch…, latent_dim].
variance of function evaluation tensor
- Return type:
Tuple
[Tensor
,Tensor
]- Returns:
return[0] has shape [batch…, observation_dim].
return[1] has shape [batch…, observation_dim].
mean and variance
- variational_expectations(X, Fmu, Fvar, Y)[source]#
Compute the expected log density of the data, given a Gaussian distribution for the function values,
- i.e. if
q(f) = N(Fmu, Fvar)
and this object represents
p(y|f)
then this method computes
∫ log(p(y=Y|f)) q(f) df.
This only works if the broadcasting dimension of the statistics of q(f) (mean and variance) are broadcastable with that of the data Y.
- Parameters:
X (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –X has shape [broadcast batch…, input_dim].
input tensor
Fmu (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –Fmu has shape [broadcast batch…, latent_dim].
mean function evaluation tensor
Fvar (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –Fvar has shape [broadcast batch…, latent_dim].
variance of function evaluation tensor
Y (
Union
[ndarray
[Any
,Any
],Tensor
,Variable
,Parameter
]) –Y has shape [broadcast batch…, observation_dim].
observation tensor
- Return type:
Tensor
- Returns:
return has shape [batch…].
expected log density of the data given q(F)
gpflow.likelihoods.MonteCarloLikelihood#
- class gpflow.likelihoods.MonteCarloLikelihood(*args, **kwargs)[source]#
Bases:
Likelihood
- Parameters:
args (
Any
) –kwargs (
Any
) –
gpflow.likelihoods.MultiClass#
- class gpflow.likelihoods.MultiClass(num_classes, invlink=None, **kwargs)[source]#
Bases:
Likelihood
- Parameters:
num_classes (
int
) –invlink (
Optional
[RobustMax
]) –kwargs (
Any
) –
gpflow.likelihoods.MultiLatentLikelihood#
- class gpflow.likelihoods.MultiLatentLikelihood(latent_dim, **kwargs)[source]#
Bases:
QuadratureLikelihood
A Likelihood which assumes that a single dimensional observation is driven by multiple latent GPs.
Note that this implementation does not allow for taking into account covariance between outputs.
- Parameters:
latent_dim (
int
) –kwargs (
Any
) –
gpflow.likelihoods.MultiLatentTFPConditional#
- class gpflow.likelihoods.MultiLatentTFPConditional(latent_dim, conditional_distribution, **kwargs)[source]#
Bases:
MultiLatentLikelihood
MultiLatent likelihood where the conditional distribution is given by a TensorFlow Probability Distribution.
- Parameters:
latent_dim (
int
) –conditional_distribution (
Callable
[...
,Distribution
]) –kwargs (
Any
) –
gpflow.likelihoods.Ordinal#
- class gpflow.likelihoods.Ordinal(bin_edges, **kwargs)[source]#
Bases:
ScalarLikelihood
A likelihood for doing ordinal regression.
The data are integer values from 0 to k, and the user must specify (k-1) ‘bin edges’ which define the points at which the labels switch. Let the bin edges be [a₀, a₁, … aₖ₋₁], then the likelihood is
p(Y=0|F) = ɸ((a₀ - F) / σ) p(Y=1|F) = ɸ((a₁ - F) / σ) - ɸ((a₀ - F) / σ) p(Y=2|F) = ɸ((a₂ - F) / σ) - ɸ((a₁ - F) / σ) … p(Y=K|F) = 1 - ɸ((aₖ₋₁ - F) / σ)
where ɸ is the cumulative density function of a Gaussian (the inverse probit function) and σ is a parameter to be learned.
A reference is Chu and Ghahramani [CG05].
- Parameters:
bin_edges (
ndarray
[Any
,Any
]) –kwargs (
Any
) –
gpflow.likelihoods.Poisson#
- class gpflow.likelihoods.Poisson(invlink=<function exp>, binsize=1.0, **kwargs)[source]#
Bases:
ScalarLikelihood
Poisson likelihood for use with count data, where the rate is given by the (transformed) GP.
let g(.) be the inverse-link function, then this likelihood represents
p(yᵢ | fᵢ) = Poisson(yᵢ | g(fᵢ) * binsize)
Note:binsize For use in a Log Gaussian Cox process (doubly stochastic model) where the rate function of an inhomogeneous Poisson process is given by a GP. The intractable likelihood can be approximated via a Riemann sum (with bins of size ‘binsize’) and using this Poisson likelihood.
- Parameters:
invlink (
Callable
[[Tensor
],Tensor
]) –binsize (
float
) –kwargs (
Any
) –
gpflow.likelihoods.QuadratureLikelihood#
- class gpflow.likelihoods.QuadratureLikelihood(input_dim, latent_dim, observation_dim, *, quadrature=None)[source]#
Bases:
Likelihood
,ABC
- Parameters:
input_dim (
Optional
[int
]) –latent_dim (
Optional
[int
]) –observation_dim (
Optional
[int
]) –quadrature (
Optional
[GaussianQuadrature
]) –
gpflow.likelihoods.RobustMax#
- class gpflow.likelihoods.RobustMax(num_classes, epsilon=0.001, **kwargs)[source]#
Bases:
Module
This class represent a multi-class inverse-link function. Given a vector \(f=[f_1, f_2, ... f_k]\), the result of the mapping is
\[y = [y_1 ... y_k]\]with
\[\begin{split}y_i = \left\{ \begin{array}{ll} (1-\varepsilon) & \textrm{if} \ i = \textrm{argmax}(f) \\ \varepsilon/(k-1) & \textrm{otherwise} \end{array} \right.\end{split}\]where \(k\) is the number of classes.
- Parameters:
num_classes (
int
) –epsilon (
float
) –kwargs (
Any
) –
gpflow.likelihoods.ScalarLikelihood#
- class gpflow.likelihoods.ScalarLikelihood(**kwargs)[source]#
Bases:
QuadratureLikelihood
,ABC
A likelihood class that helps with scalar likelihood functions: likelihoods where each scalar latent function is associated with a single scalar observation variable.
If there are multiple latent functions, then there must be a corresponding number of data: we check for this.
The Likelihood class contains methods to compute marginal statistics of functions of the latents and the data ϕ(y,x,f):
variational_expectations: ϕ(y,x,f) = log p(y|x,f)
predict_log_density: ϕ(y,x,f) = p(y|x,f)
Those statistics are computed after having first marginalized the latent processes f under a multivariate normal distribution q(x,f) that is fully factorized.
Some univariate integrals can be done by quadrature: we implement quadrature routines for 1D integrals in this class, though they may be overwritten by inheriting classes where those integrals are available in closed form.
- Parameters:
kwargs (
Any
) –
gpflow.likelihoods.Softmax#
- class gpflow.likelihoods.Softmax(num_classes, **kwargs)[source]#
Bases:
MonteCarloLikelihood
The soft-max multi-class likelihood. It can only provide a stochastic Monte-Carlo estimate of the variational expectations term, but this added variance tends to be small compared to that due to mini-batching (when using the SVGP model).
- Parameters:
num_classes (
int
) –kwargs (
Any
) –
gpflow.likelihoods.StudentT#
- class gpflow.likelihoods.StudentT(scale=1.0, df=3.0, scale_lower_bound=1e-06, **kwargs)[source]#
Bases:
ScalarLikelihood
gpflow.likelihoods.SwitchedLikelihood#
- class gpflow.likelihoods.SwitchedLikelihood(likelihood_list, **kwargs)[source]#
Bases:
ScalarLikelihood
- Parameters:
likelihood_list (
Iterable
[ScalarLikelihood
]) –kwargs (
Any
) –