Concepedia

Publication | Open Access

Scalable Bayesian Optimization Using Deep Neural Networks

376

Citations

35

References

2015

Year

TLDR

Bayesian optimization relies on surrogate models to guide expensive function evaluations, but Gaussian processes, the standard choice, scale cubically and hinder large‑scale or massively parallel applications. This work investigates replacing Gaussian processes with neural networks to model the distribution over functions in Bayesian optimization. Neural networks are used as surrogate models, providing a linear‑time alternative that supports adaptive basis‑function regression. The neural‑network surrogate matches state‑of‑the‑art GP performance while scaling linearly, enabling highly parallel hyperparameter optimization that rapidly produces competitive models on image‑recognition and caption‑generation benchmarks.

Abstract

Bayesian optimization is an effective methodology for the global optimization of functions with expensive evaluations. It relies on querying a distribution over functions defined by a relatively cheap surrogate model. An accurate model for this distribution over functions is critical to the effectiveness of the approach, and is typically fit using Gaussian processes (GPs). However, since GPs scale cubically with the number of observations, it has been challenging to handle objectives whose optimization requires many evaluations, and as such, massively parallelizing the optimization. In this work, we explore the use of neural networks as an alternative to GPs to model distributions over functions. We show that performing adaptive basis function regression with a neural network as the parametric form performs competitively with state-of-the-art GP-based approaches, but scales linearly with the number of data rather than cubically. This allows us to achieve a previously intractable degree of parallelism, which we apply to large scale hyperparameter optimization, rapidly finding competitive models on benchmark object recognition tasks using convolutional networks, and image caption generation using neural language models.

References

YearCitations

Page 1