gpflow.likelihoods#

Likelihoods are another core component of GPflow. This describes how likely the data is under the assumptions made about the underlying latent functions p(Y|F). Different likelihoods make different assumptions about the distribution of the data, as such different data-types (continuous, binary, ordinal, count) are better modelled with different likelihood assumptions.

Use of any likelihood other than Gaussian typically introduces the need to use an approximation to perform inference, if one isn’t already needed. Variational inference and MCMC models are included in GPflow and allow approximate inference with non-Gaussian likelihoods. An introduction to these models can be found here. Specific notebooks illustrating non-Gaussian likelihood regressions are available for classification (binary data), ordinal and multiclass.

Creating new likelihoods#

Likelihoods are defined by their log-likelihood. When creating new likelihoods, the logp method (log p(Y|F)), the conditional_mean, conditional_variance.

In order to perform variational inference with non-Gaussian likelihoods a term called variational expectations, ∫ q(F) log p(Y|F) dF, needs to be computed under a Gaussian distribution q(F) ~ N(μ, Σ).

The variational_expectations method can be overriden if this can be computed in closed form, otherwise; if the new likelihood inherits Likelihood the default will use Gauss-Hermite numerical integration (works well when F is 1D or 2D), if the new likelihood inherits from MonteCarloLikelihood the integration is done by sampling (can be more suitable when F is higher dimensional).

Modules#

Classes#

gpflow.likelihoods.Bernoulli#

class gpflow.likelihoods.Bernoulli(invlink=<function inv_probit>, **kwargs)[source]#

Bases: gpflow.likelihoods.base.ScalarLikelihood

Parameters
  • invlink (Callable[[Tensor], Tensor]) –

  • kwargs (Any) –

gpflow.likelihoods.Beta#

class gpflow.likelihoods.Beta(invlink=<function inv_probit>, scale=1.0, **kwargs)[source]#

Bases: gpflow.likelihoods.base.ScalarLikelihood

This uses a reparameterisation of the Beta density. We have the mean of the Beta distribution given by the transformed process:

m = invlink(f)

and a scale parameter. The familiar α, β parameters are given by

m = α / (α + β) scale = α + β

so:

α = scale * m β = scale * (1-m)

Parameters
  • invlink (Callable[[Tensor], Tensor]) –

  • scale (float) –

  • kwargs (Any) –

gpflow.likelihoods.Exponential#

class gpflow.likelihoods.Exponential(invlink=<function exp>, **kwargs)[source]#

Bases: gpflow.likelihoods.base.ScalarLikelihood

Parameters
  • invlink (Callable[[Tensor], Tensor]) –

  • kwargs (Any) –

gpflow.likelihoods.Gamma#

class gpflow.likelihoods.Gamma(invlink=<function exp>, **kwargs)[source]#

Bases: gpflow.likelihoods.base.ScalarLikelihood

Use the transformed GP to give the scale (inverse rate) of the Gamma

Parameters
  • invlink (Callable[[Tensor], Tensor]) –

  • kwargs (Any) –

gpflow.likelihoods.Gaussian#

class gpflow.likelihoods.Gaussian(variance=1.0, variance_lower_bound=1e-06, **kwargs)[source]#

Bases: gpflow.likelihoods.base.ScalarLikelihood

The Gaussian likelihood is appropriate where uncertainties associated with the data are believed to follow a normal distribution, with constant variance.

Very small uncertainties can lead to numerical instability during the optimization process. A lower bound of 1e-6 is therefore imposed on the likelihood variance by default.

Parameters
  • variance (float) –

  • variance_lower_bound (float) –

  • kwargs (Any) –

gpflow.likelihoods.GaussianMC#

class gpflow.likelihoods.GaussianMC(*args, **kwargs)[source]#

Bases: gpflow.likelihoods.base.MonteCarloLikelihood, gpflow.likelihoods.scalar_continuous.Gaussian

Stochastic version of Gaussian likelihood for demonstration purposes only.

Parameters
  • args (Any) –

  • kwargs (Any) –

gpflow.likelihoods.HeteroskedasticTFPConditional#

class gpflow.likelihoods.HeteroskedasticTFPConditional(distribution_class=<class 'tensorflow_probability.python.distributions.normal.Normal'>, scale_transform=None, **kwargs)[source]#

Bases: gpflow.likelihoods.multilatent.MultiLatentTFPConditional

Heteroskedastic Likelihood where the conditional distribution is given by a TensorFlow Probability Distribution. The loc and scale of the distribution are given by a two-dimensional multi-output GP.

Parameters
  • distribution_class (Type[Distribution]) –

  • scale_transform (Optional[Bijector]) –

  • kwargs (Any) –

gpflow.likelihoods.Likelihood#

class gpflow.likelihoods.Likelihood(latent_dim, observation_dim)[source]#

Bases: gpflow.base.Module

Parameters
  • latent_dim (Optional[int]) –

  • observation_dim (Optional[int]) –

conditional_mean(F)[source]#

The conditional mean of Y|F: [E[Y₁|F], …, E[Yₖ|F]] where K = observation_dim

Parameters

F (Union[ndarray[Any, Any], Tensor, Variable, Parameter]) – function evaluation Tensor, with shape […, latent_dim]

Return type

Tensor

Returns

mean […, observation_dim]

conditional_variance(F)[source]#

The conditional marginal variance of Y|F: [var(Y₁|F), …, var(Yₖ|F)] where K = observation_dim

Parameters

F (Union[ndarray[Any, Any], Tensor, Variable, Parameter]) – function evaluation Tensor, with shape […, latent_dim]

Return type

Tensor

Returns

variance […, observation_dim]

log_prob(F, Y)[source]#

The log probability density log p(Y|F)

Parameters
  • F (Union[ndarray[Any, Any], Tensor, Variable, Parameter]) – function evaluation Tensor, with shape […, latent_dim]

  • Y (Union[ndarray[Any, Any], Tensor, Variable, Parameter]) – observation Tensor, with shape […, observation_dim]:

Return type

Tensor

Returns

log pdf, with shape […]

predict_log_density(Fmu, Fvar, Y)[source]#

Given a Normal distribution for the latent function, and a datum Y, compute the log predictive density of Y,

i.e. if

q(F) = N(Fmu, Fvar)

and this object represents

p(y|F)

then this method computes the predictive density

log ∫ p(y=Y|F)q(F) df

Parameters
  • Fmu (Union[ndarray[Any, Any], Tensor, Variable, Parameter]) – mean function evaluation Tensor, with shape […, latent_dim]

  • Fvar (Union[ndarray[Any, Any], Tensor, Variable, Parameter]) – variance of function evaluation Tensor, with shape […, latent_dim]

  • Y (Union[ndarray[Any, Any], Tensor, Variable, Parameter]) – observation Tensor, with shape […, observation_dim]:

Return type

Tensor

Returns

log predictive density, with shape […]

predict_mean_and_var(Fmu, Fvar)[source]#

Given a Normal distribution for the latent function, return the mean and marginal variance of Y,

i.e. if

q(f) = N(Fmu, Fvar)

and this object represents

p(y|f)

then this method computes the predictive mean

∫∫ y p(y|f)q(f) df dy

and the predictive variance

∫∫ y² p(y|f)q(f) df dy - [ ∫∫ y p(y|f)q(f) df dy ]²

Parameters
  • Fmu (Union[ndarray[Any, Any], Tensor, Variable, Parameter]) – mean function evaluation Tensor, with shape […, latent_dim]

  • Fvar (Union[ndarray[Any, Any], Tensor, Variable, Parameter]) – variance of function evaluation Tensor, with shape […, latent_dim]

Return type

Tuple[Tensor, Tensor]

Returns

mean and variance, both with shape […, observation_dim]

variational_expectations(Fmu, Fvar, Y)[source]#

Compute the expected log density of the data, given a Gaussian distribution for the function values,

i.e. if

q(f) = N(Fmu, Fvar)

and this object represents

p(y|f)

then this method computes

∫ log(p(y=Y|f)) q(f) df.

This only works if the broadcasting dimension of the statistics of q(f) (mean and variance) are broadcastable with that of the data Y.

Parameters
  • Fmu (Union[ndarray[Any, Any], Tensor, Variable, Parameter]) – mean function evaluation Tensor, with shape […, latent_dim]

  • Fvar (Union[ndarray[Any, Any], Tensor, Variable, Parameter]) – variance of function evaluation Tensor, with shape […, latent_dim]

  • Y (Union[ndarray[Any, Any], Tensor, Variable, Parameter]) – observation Tensor, with shape […, observation_dim]:

Return type

Tensor

Returns

expected log density of the data given q(F), with shape […]

gpflow.likelihoods.MonteCarloLikelihood#

class gpflow.likelihoods.MonteCarloLikelihood(*args, **kwargs)[source]#

Bases: gpflow.likelihoods.base.Likelihood

Parameters
  • args (Any) –

  • kwargs (Any) –

gpflow.likelihoods.MultiClass#

class gpflow.likelihoods.MultiClass(num_classes, invlink=None, **kwargs)[source]#

Bases: gpflow.likelihoods.base.Likelihood

Parameters
  • num_classes (int) –

  • invlink (Optional[RobustMax]) –

  • kwargs (Any) –

gpflow.likelihoods.MultiLatentLikelihood#

class gpflow.likelihoods.MultiLatentLikelihood(latent_dim, **kwargs)[source]#

Bases: gpflow.likelihoods.base.QuadratureLikelihood

A Likelihood which assumes that a single dimensional observation is driven by multiple latent GPs.

Note that this implementation does not allow for taking into account covariance between outputs.

Parameters
  • latent_dim (int) –

  • kwargs (Any) –

gpflow.likelihoods.MultiLatentTFPConditional#

class gpflow.likelihoods.MultiLatentTFPConditional(latent_dim, conditional_distribution, **kwargs)[source]#

Bases: gpflow.likelihoods.multilatent.MultiLatentLikelihood

MultiLatent likelihood where the conditional distribution is given by a TensorFlow Probability Distribution.

Parameters
  • latent_dim (int) –

  • conditional_distribution (Callable[..., Distribution]) –

  • kwargs (Any) –

gpflow.likelihoods.Ordinal#

class gpflow.likelihoods.Ordinal(bin_edges, **kwargs)[source]#

Bases: gpflow.likelihoods.base.ScalarLikelihood

A likelihood for doing ordinal regression.

The data are integer values from 0 to k, and the user must specify (k-1) ‘bin edges’ which define the points at which the labels switch. Let the bin edges be [a₀, a₁, … aₖ₋₁], then the likelihood is

p(Y=0|F) = ɸ((a₀ - F) / σ) p(Y=1|F) = ɸ((a₁ - F) / σ) - ɸ((a₀ - F) / σ) p(Y=2|F) = ɸ((a₂ - F) / σ) - ɸ((a₁ - F) / σ) … p(Y=K|F) = 1 - ɸ((aₖ₋₁ - F) / σ)

where ɸ is the cumulative density function of a Gaussian (the inverse probit function) and σ is a parameter to be learned.

A reference is Chu and Ghahramani [CG05].

Parameters
  • bin_edges (ndarray[Any, Any]) –

  • kwargs (Any) –

gpflow.likelihoods.Poisson#

class gpflow.likelihoods.Poisson(invlink=<function exp>, binsize=1.0, **kwargs)[source]#

Bases: gpflow.likelihoods.base.ScalarLikelihood

Poisson likelihood for use with count data, where the rate is given by the (transformed) GP.

let g(.) be the inverse-link function, then this likelihood represents

p(yᵢ | fᵢ) = Poisson(yᵢ | g(fᵢ) * binsize)

Note:binsize For use in a Log Gaussian Cox process (doubly stochastic model) where the rate function of an inhomogeneous Poisson process is given by a GP. The intractable likelihood can be approximated via a Riemann sum (with bins of size ‘binsize’) and using this Poisson likelihood.

Parameters
  • invlink (Callable[[Tensor], Tensor]) –

  • binsize (float) –

  • kwargs (Any) –

gpflow.likelihoods.QuadratureLikelihood#

class gpflow.likelihoods.QuadratureLikelihood(latent_dim, observation_dim, *, quadrature=None)[source]#

Bases: gpflow.likelihoods.base.Likelihood

Parameters
  • latent_dim (Optional[int]) –

  • observation_dim (Optional[int]) –

  • quadrature (Optional[GaussianQuadrature]) –

gpflow.likelihoods.RobustMax#

class gpflow.likelihoods.RobustMax(num_classes, epsilon=0.001, **kwargs)[source]#

Bases: gpflow.base.Module

This class represent a multi-class inverse-link function. Given a vector \(f=[f_1, f_2, ... f_k]\), the result of the mapping is

\[y = [y_1 ... y_k]\]

with

\[\begin{split}y_i = \left\{ \begin{array}{ll} (1-\varepsilon) & \textrm{if} \ i = \textrm{argmax}(f) \\ \varepsilon/(k-1) & \textrm{otherwise} \end{array} \right.\end{split}\]

where \(k\) is the number of classes.

Parameters
  • num_classes (int) –

  • epsilon (float) –

  • kwargs (Any) –

gpflow.likelihoods.ScalarLikelihood#

class gpflow.likelihoods.ScalarLikelihood(**kwargs)[source]#

Bases: gpflow.likelihoods.base.QuadratureLikelihood

A likelihood class that helps with scalar likelihood functions: likelihoods where each scalar latent function is associated with a single scalar observation variable.

If there are multiple latent functions, then there must be a corresponding number of data: we check for this.

The Likelihood class contains methods to compute marginal statistics of functions of the latents and the data ϕ(y,f):

  • variational_expectations: ϕ(y,f) = log p(y|f)

  • predict_log_density: ϕ(y,f) = p(y|f)

Those statistics are computed after having first marginalized the latent processes f under a multivariate normal distribution q(f) that is fully factorized.

Some univariate integrals can be done by quadrature: we implement quadrature routines for 1D integrals in this class, though they may be overwritten by inheriting classes where those integrals are available in closed form.

Parameters

kwargs (Any) –

gpflow.likelihoods.Softmax#

class gpflow.likelihoods.Softmax(num_classes, **kwargs)[source]#

Bases: gpflow.likelihoods.base.MonteCarloLikelihood

The soft-max multi-class likelihood. It can only provide a stochastic Monte-Carlo estimate of the variational expectations term, but this added variance tends to be small compared to that due to mini-batching (when using the SVGP model).

Parameters
  • num_classes (int) –

  • kwargs (Any) –

gpflow.likelihoods.StudentT#

class gpflow.likelihoods.StudentT(scale=1.0, df=3.0, **kwargs)[source]#

Bases: gpflow.likelihoods.base.ScalarLikelihood

Parameters
  • scale (float) –

  • df (float) –

  • kwargs (Any) –

gpflow.likelihoods.SwitchedLikelihood#

class gpflow.likelihoods.SwitchedLikelihood(likelihood_list, **kwargs)[source]#

Bases: gpflow.likelihoods.base.ScalarLikelihood

Parameters