Multi Objective Bayesian Optimization with BoTorch
Introduction
This post presents how to solve a multi objective problem with the help of Bayesian Optimization through BoTorch [1].
Targeted audience for this post are developers looking for a hands on example of BoTorch implementation specifically for multi objective problems. This post does not intend to explain the optimization strategy setup by BoTorch or at least not in great detail.
First chapter will focus on a simple one objective optimization setup with BoTorch. The following will go through the needed adaptation to make this setup “mutli objective ready”
BoTorch
BoTorch is a python framework (by facebook) built on top of pytorch as the name implies. It provides implementations of state of the art Bayesian Optimization solutions for single and multi objectives problems.
Single Objective Optimization
This first section will go through the implementation of bayesian optimization for a simple one objective problem [2]. For an introduction on Bayesian optimization I recommend to go first though my related post.
We start by defining a cost function. When evaluated we also apply a random noise represented here by the red area. As it is a cost function our goal is to minimize it.
We also select 3 random starting points that will be used by the optimization process to fit the surrogate.
As mentioned before, the dots don’t exactly follow the line as we added noise to the evaluation.
Now that our initial samples are ready we can start to work on the optimization itself.
First we need to implement the GP. This model will be tasked to estimate the cost function f. To do so it relies on the observations, meaning over time (the more observations we add) the model will get more information about f.
In a second time we extract the likelihood of the GP. This is a great property of those kinds of models that the acquisition is gonna rely on to select new points.
Once the model is fitted the acquisition function is prepared to help select relevant points to sample on the actual cost function. The acquisition function computes (in this example) the expected improvement for a given parameter. This function is then optimized over a given amount of restarts and samples to find the ideal candidate.
We can then use the new candidate to evaluate the cost function and iterate over candidate generation to converge towards the cost function minimum.
As we can see, after 10 iterations the minimum has clearly been reached.
Monte Carlo and Quasi Monte Carlo Acquisition
You’ve maybe noticed in the last snippet of code that the acquisition function is prefixed by a q (qExpectedImprovement). This q stands for “quasi Monte Carlo” which is one of the strengths of BoTorch [3].
Without going into too much detail through Monte Carelo it is possible to compute an exact estimate of the expected improvement (within the Monte Carlo integration error). Compared to non Monte Carlo solutions this approach has the great benefits of being derivable and if a gradient can be computed then the optimization of this function can be greatly optimized (whereas otherwise optimization relies solely on gradient free approaches).
Multi Objective Optimization
In the multi-objective context there is no longer a single optimal cost value to find but rather a compromise between multiple cost functions.
Let’s assume 2 cost functions F1 and F2. Both functions share the same (single) parameter x and both should be minimized. There might not be an ideal x value to minimize both functions but optimization can still be done.
To better understand the multi objective optimization we first have to clarify the notion of dominance. A parameter x_1 dominates another parameter x_2 when the value of both cost function F1 and F2 are lower with x_1. In that case x_2 being dominated we know it will never be kept as a solution of the optimization.
Finding all the non-dominated points leads us to find the Pareto front [4] where each point consists of a compromise between the 2 objectives. Here the context of your optimization will help you decide which solution to pick.
Implementation
Based on Botorch’s tutorial [1]
The model consists of a list of mono objective models. The likelihood at each point is the sum of all GP’s likelihood.
The acquisition optimization should now optimize multiple objectives at one. To do so in this example we illustrate an approach called qNParEGO [5]. It starts with a scalarization step tasked to combine all objectives into a single compound function. In this case the scalarization approach is called augmented chebyshev [6].
This is a random process that might slow the convergence of the optimization compared to other solutions like qEHVI or qNEHVI [1], [5]. Although the proposed implementation generates as many random objective combinations as output candidates which can, in some part, help targeting different configurations at a single iteration.
As for the mono objective approach we start by sampling a few random points over the parameters bounds. Although this time each sampled point is mapped to a value for both cost function F1 and F2. The x axis gives the parameter value and the Y + color combination the value observed for a given cost function.
After 5 iterations the optimization has mainly focused on exploration. We can see scarsed points over the parameter bound with no clear zone of interest.
After 15 iterations (and the end of the optimization in this case) an optimal has been reached around x = -0.2. When looking at the graph we can see that this is in fact a zone where both functions reach minimum.
To finalize our search we can then compute the pareto front formed with all our observations. All those points (in the pareto front) represent viable solutions each with a different objective compromise.
Conclusion
With the help of BoTorch, a framework for Bayesian OPtimization in PyTorch, we were able to optimize mono and multi-objective optimization problems.
There are a lot more advanced, in depth examples and articles on this subject on BoTorch website which I encourage you to read if you are interested.
Complete Notebook
You can find the notebook used to write this post with this link
References
[1] “BoTorch · Bayesian Optimization in PyTorch.” https://botorch.org/ (accessed Jul. 06, 2022).
[2] paretos, Coding Bayesian Optimization (Bayes Opt) with BOTORCH — Python example for hyperparameter tuning, (2021). Accessed: Jul. 08, 2022. [Online Video]. Available: https://www.youtube.com/watch?v=BQ4kVn-Rt84
[3] “Acquisition Functions · BoTorch.” https://botorch.org/ (accessed Jul. 13, 2022).
[4] “Pareto front,” Wikipedia. Nov. 26, 2021. Accessed: Jul. 11, 2022. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Pareto_front&oldid=1057242200
[5] S. Daulton, M. Balandat, and E. Bakshy, “Differentiable Expected Hypervolume Improvement for Parallel Multi-Objective Bayesian Optimization.” arXiv, Oct. 23, 2020. Accessed: Jul. 13, 2022. [Online]. Available: http://arxiv.org/abs/2006.05078
[6] T. Chugh, “Scalarizing Functions in Bayesian Multiobjective Optimization.” arXiv, Apr. 11, 2019. Accessed: Jul. 12, 2022. [Online]. Available: http://arxiv.org/abs/1904.05760