GPflow#

GPflow is a package for building Gaussian Process models in python, using TensorFlow. A Gaussian Process is a kind of supervised learning model. Some advantages of Gaussian Processes are:

  • Uncertainty is an inherent part of Gaussian Processes. A Gaussian Process can tell you when it does not know the answer.

  • Works well with small datasets. If your data is limited Gaussian Procceses can get the most from your data.

  • Can scale to large datasets. Although Gaussian Processes, admittedly, can be computationally intensive there are ways to scale them to large datasets.

GPflow was originally created by James Hensman and Alexander G. de G. Matthews. Today it is primarily maintained by the company Secondmind.

Documentation#

If you’re new to GPflow we suggest you continue to:

For more in-depth documentation see:

What models are implemented?#

GPflow has a slew of kernels that can be combined in a straightforward way. As for inference, the options are currently:

Regression#

For GP regression with Gaussian noise, it’s possible to marginalize the function values exactly: you’ll find this in gpflow.models.GPR. You can do maximum likelihood or MCMC for the covariance function parameters.

It’s also possible to do Sparse GP regression using the gpflow.models.SGPR class. This is based on work by Michalis Titsias [Tit09].

MCMC#

For non-Gaussian likelihoods, GPflow has a model that can jointly sample over the function values and the covariance parameters: gpflow.models.GPMC. There’s also a sparse equivalent in gpflow.models.SGPMC, based on Hensman et al. [HMFG15].

Variational inference#

It’s often sufficient to approximate the function values as a Gaussian, for which we follow Opper and Archambeau [OA09] in gpflow.models.VGP. In addition, there is a sparse version based on Hensman et al. [HMG15] in gpflow.models.SVGP. In the Gaussian likelihood case some of the optimization may be done analytically as discussed in Titsias [Tit09] and implemented in gpflow.models.SGPR . All of the sparse methods in GPflow are solidified in Matthews et al. [MHTG16].

The following table summarizes the model options in GPflow.

Gaussian Likelihood

Non-Gaussian (variational)

Non-Gaussian (MCMC)

Full-covariance

gpflow.models.GPR

gpflow.models.VGP

gpflow.models.GPMC

Sparse approximation

gpflow.models.SGPR

gpflow.models.SVGP

gpflow.models.SGPMC

A unified view of many of the relevant references, along with some extensions, and an early discussion of GPflow itself, is given in the PhD thesis of Matthews [Mat17].

Interdomain inference and multioutput GPs#

GPflow has an extensive and flexible framework for specifying interdomain inducing variables for variational approximations. Interdomain variables can greatly improve the effectiveness of a variational approximation, and are used in e.g. Convolutional Gaussian Processes. In particular, they are crucial for defining sensible sparse approximations for multioutput GPs (Multi-output Gaussian processes in GPflow).

GPflow has a unifying design for using multioutput GPs and specifying interdomain approximations. A review of the mathematical background and the resulting software design is described in van der Wilk et al. [vandWilkDJ+20].

GPLVM#

For visualisation, the GPLVM [Law03] and Bayesian GPLVM [TL10] models are implemented in GPflow (Bayesian Gaussian process latent variable model (Bayesian GPLVM)).

Heteroskedastic models#

GPflow supports heteroskedastic models by configuring a likelihood object. See examples in Gaussian process regression with varying output noise and Heteroskedastic Likelihood and Multi-Latent GP

Contact#

  • GPflow is an open source project, and you can find this project on GitHub.

  • If you find any bugs, please file a ticket.

  • If you need help, please use Stack Overflow.

  • If you otherwise need to contact us, the easiest way to get in touch is through our Slack workspace.

If you feel you have some relevant skills and are interested in contributing then please read our notes for contributors and contact us. We maintain a full list of contributors.

Citing GPflow#

To cite GPflow, please reference Matthews et al. [MvandWilkN+17]. Sample BibTeX is given below:

@ARTICLE{GPflow2017,
    author = {Matthews, Alexander G. de G. and
              {van der Wilk}, Mark and
              Nickson, Tom and
              Fujii, Keisuke. and
              {Boukouvalas}, Alexis and
              {Le{\'o}n-Villagr{\'a}}, Pablo and
              Ghahramani, Zoubin and
              Hensman, James},
    title = "{{GP}flow: A {G}aussian process library using {T}ensor{F}low}",
    journal = {Journal of Machine Learning Research},
    year = {2017},
    month = {apr},
    volume = {18},
    number = {40},
    pages = {1-6},
    url = {http://jmlr.org/papers/v18/16-537.html}
}

Since the publication of the GPflow paper, the software has been significantly extended with the framework for interdomain approximations and multioutput priors. We review the framework and describe the design in van der Wilk et al. [vandWilkDJ+20], which can be cited by users:

@article{GPflow2020multioutput,
  author = {{van der Wilk}, Mark and
            Dutordoir, Vincent and
            John, ST and
            Artemev, Artem and
            Adam, Vincent and
            Hensman, James},
  title = {A Framework for Interdomain and Multioutput {G}aussian Processes},
  year = {2020},
  journal = {arXiv:2003.01115},
  url = {https://arxiv.org/abs/2003.01115}
}

Acknowledgements#

James Hensman was supported by an MRC fellowship and Alexander G. de G. Matthews was supported by EPSRC grants EP/I036575/1 and EP/N014162/1.