Monitoring Optimisation#

In this notebook we cover how to monitor the model and certain metrics during optimisation.

Setup#

[1]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

import gpflow
from gpflow.ci_utils import ci_niter

np.random.seed(0)
2022-05-10 11:10:44.952763: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-05-10 11:10:44.952801: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

The monitoring functionality lives in gpflow.monitor. For now, we import ModelToTensorBoard, ImageToTensorBoard, ScalarToTensorBoard monitoring tasks and MonitorTaskGroup and Monitor.

[2]:
from gpflow.monitor import (
    ImageToTensorBoard,
    ModelToTensorBoard,
    Monitor,
    MonitorTaskGroup,
    ScalarToTensorBoard,
)

Set up data and model#

[3]:
# Define some configuration constants.

num_data = 100
noise_std = 0.1
optimisation_steps = ci_niter(100)
[4]:
# Create dummy data.

X = np.random.randn(num_data, 1)  # [N, 2]
Y = np.sin(X) + 0.5 * np.cos(X) + np.random.randn(*X.shape) * noise_std  # [N, 1]
plt.plot(X, Y, "o")
[4]:
[<matplotlib.lines.Line2D at 0x7ff0adf36230>]
../../_images/notebooks_basics_monitoring_6_1.png
[5]:
# Set up model and print

kernel = gpflow.kernels.SquaredExponential(lengthscales=[1.0, 2.0]) + gpflow.kernels.Linear()
model = gpflow.models.GPR((X, Y), kernel, noise_variance=noise_std ** 2)
model
2022-05-10 11:10:47.600010: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-05-10 11:10:47.600041: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-05-10 11:10:47.600060: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (49c966262641): /proc/driver/nvidia/version does not exist
2022-05-10 11:10:47.600322: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
[5]:
<gpflow.models.gpr.GPR object at 0x7ff0add35ae0>
name class transform prior trainable shape dtype value
GPR.kernel.kernels[0].variance ParameterSoftplus True () float641.0
GPR.kernel.kernels[0].lengthscalesParameterSoftplus True (2,) float64[1. 2.]
GPR.kernel.kernels[1].variance ParameterSoftplus True () float641.0
GPR.likelihood.variance ParameterSoftplus + Shift True () float640.009999999999999998
[6]:
# We define a function that plots the model's prediction (in the form of samples) together with the data.
# Importantly, this function has no other argument than `fig: matplotlib.figure.Figure` and `ax: matplotlib.figure.Axes`.


def plot_prediction(fig, ax):
    Xnew = np.linspace(X.min() - 0.5, X.max() + 0.5, 100).reshape(-1, 1)
    Ypred = model.predict_f_samples(Xnew, full_cov=True, num_samples=20)
    ax.plot(Xnew.flatten(), np.squeeze(Ypred).T, "C1", alpha=0.2)
    ax.plot(X, Y, "o")


# Let's check if the function does the desired plotting
fig = plt.figure()
ax = fig.subplots()
plot_prediction(fig, ax)
plt.show()
../../_images/notebooks_basics_monitoring_8_0.png

Set up monitoring tasks#

We now define the MonitorTasks that will be executed during the optimisation. For this tutorial we set up three tasks: - ModelToTensorBoard: writes the models hyper-parameters such as likelihood.variance and kernel.lengthscales to a TensorBoard. - ImageToTensorBoard: writes custom matplotlib images to a TensorBoard. - ScalarToTensorBoard: writes any scalar value to a TensorBoard. Here, we use it to write the model’s training objective.

[7]:
log_dir = "logs"  # Directory where TensorBoard files will be written.
model_task = ModelToTensorBoard(log_dir, model)
image_task = ImageToTensorBoard(log_dir, plot_prediction, "image_samples")
lml_task = ScalarToTensorBoard(log_dir, lambda: model.training_loss(), "training_objective")

We now group the tasks in a set of fast and slow tasks and pass them to the monitor. This allows us to execute the groups at a different frequency.

[8]:
# Plotting tasks can be quite slow. We want to run them less frequently.
# We group them in a `MonitorTaskGroup` and set the period to 5.
slow_tasks = MonitorTaskGroup(image_task, period=5)

# The other tasks are fast. We run them at each iteration of the optimisation.
fast_tasks = MonitorTaskGroup([model_task, lml_task], period=1)

# Both groups are passed to the monitor.
# `slow_tasks` will be run five times less frequently than `fast_tasks`.
monitor = Monitor(fast_tasks, slow_tasks)
[9]:
training_loss = model.training_loss_closure(
    compile=True
)  # compile=True (default): compiles using tf.function
opt = tf.optimizers.Adam()

for step in range(optimisation_steps):
    opt.minimize(training_loss, model.trainable_variables)
    monitor(step)  # <-- run the monitoring
2022-05-10 11:10:47.907163: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.

TensorBoard is accessible through the browser, after launching the server by running tensorboard --logdir ${logdir}. See the TensorFlow documentation on TensorBoard for more information.

For optimal performance, we can also wrap the monitor call inside tf.function:#

[10]:
opt = tf.optimizers.Adam()

log_dir_compiled = f"{log_dir}/compiled"
model_task = ModelToTensorBoard(log_dir_compiled, model)
lml_task = ScalarToTensorBoard(
    log_dir_compiled, lambda: model.training_loss(), "training_objective"
)
# Note that the `ImageToTensorBoard` task cannot be compiled, and is omitted from the monitoring
monitor = Monitor(MonitorTaskGroup([model_task, lml_task]))

In the optimisation loop below we use tf.range (rather than Python’s built-in range) to avoid re-tracing the step function each time.

[11]:
@tf.function
def step(i):
    opt.minimize(model.training_loss, model.trainable_variables)
    monitor(i)


# Notice the tf.range
for i in tf.range(optimisation_steps):
    step(i)

When opening TensorBoard, you may need to use the command tensorboard --logdir . --reload_multifile=true, as multiple FileWriter objects are used.

Scipy Optimization monitoring#

Note that if you want to use the Scipy optimizer provided by GPflow, and want to monitor the training progress, then you need to simply replace the optimization loop with a single call to its minimize method and pass in the monitor as a step_callback keyword argument:

[12]:
opt = gpflow.optimizers.Scipy()

log_dir_scipy = f"{log_dir}/scipy"
model_task = ModelToTensorBoard(log_dir_scipy, model)
lml_task = ScalarToTensorBoard(log_dir_scipy, lambda: model.training_loss(), "training_objective")
image_task = ImageToTensorBoard(log_dir_scipy, plot_prediction, "image_samples")

monitor = Monitor(
    MonitorTaskGroup([model_task, lml_task], period=1), MonitorTaskGroup(image_task, period=5)
)
[13]:
opt.minimize(training_loss, model.trainable_variables, step_callback=monitor)
[13]:
      fun: -69.68099880889753
 hess_inv: <5x5 LbfgsInvHessProduct with dtype=float64>
      jac: array([-2.96735825e-04, -4.30340661e-04,  3.97830673e-04,  2.26009904e-06,
        4.29147152e-04])
  message: 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
     nfev: 37
      nit: 28
     njev: 37
   status: 0
  success: True
        x: array([  2.07005977,   1.74612938,   0.18194306, -15.21875411,
        -4.53840856])
[ ]: