Publication | Open Access
A Tutorial on Bayesian Optimization
371
Citations
60
References
2018
Year
Artificial IntelligenceBayesian StatisticEngineeringMachine LearningModel TuningBayesian Optimization SoftwareBayesian InferenceHyperparameter EstimationBayesian OptimizationData ScienceUncertainty QuantificationBayesian MethodsPublic HealthPredictive AnalyticsBayesian Optimization WorksBayesian NetworkComputer ScienceProbability TheoryModel OptimizationBayesian StatisticsParameter TuningStatistical Inference
Bayesian optimization efficiently optimizes expensive, noisy objective functions over continuous domains of fewer than 20 dimensions. This tutorial explains Bayesian optimization, covering Gaussian process regression, key acquisition functions, software, and future research directions. The method constructs a Gaussian process surrogate of the objective, uses an acquisition function to guide sampling, and extends to parallel, multi‑fidelity, constrained, and multi‑task settings. We present a formally justified generalization of expected improvement for noisy evaluations, improving upon prior ad hoc modifications.
Bayesian optimization is an approach to optimizing objective functions that take a long time (minutes or hours) to evaluate. It is best-suited for optimization over continuous domains of less than 20 dimensions, and tolerates stochastic noise in function evaluations. It builds a surrogate for the objective and quantifies the uncertainty in that surrogate using a Bayesian machine learning technique, Gaussian process regression, and then uses an acquisition function defined from this surrogate to decide where to sample. In this tutorial, we describe how Bayesian optimization works, including Gaussian process regression and three common acquisition functions: expected improvement, entropy search, and knowledge gradient. We then discuss more advanced techniques, including running multiple function evaluations in parallel, multi-fidelity and multi-information source optimization, expensive-to-evaluate constraints, random environmental conditions, multi-task Bayesian optimization, and the inclusion of derivative information. We conclude with a discussion of Bayesian optimization software and future research directions in the field. Within our tutorial material we provide a generalization of expected improvement to noisy evaluations, beyond the noise-free setting where it is more commonly applied. This generalization is justified by a formal decision-theoretic argument, standing in contrast to previous ad hoc modifications.
| Year | Citations | |
|---|---|---|
Page 1
Page 1