Multi-Task Bayesian Optimization

TLDR

Bayesian optimization is a framework for automatically tuning machine learning hyperparameters, achieving state‑of‑the‑art performance with ease and efficiency. The paper investigates transferring knowledge from prior optimizations to new tasks to accelerate hyperparameter tuning, proposes a joint error‑minimizing extension for multi‑task k‑fold cross‑validation, and adapts entropy search to a cost‑sensitive multi‑task setting. The method extends multi‑task Gaussian processes within Bayesian optimization, dynamically selects datasets to query for maximal information per cost, and incorporates a joint error‑minimization strategy and a cost‑sensitive entropy‑search acquisition function. The approach markedly speeds up optimization versus single‑task baselines and demonstrates that the adapted entropy‑search acquisition function can efficiently explore hyperparameters for large datasets using only a small dataset.

Abstract

Bayesian optimization has recently been proposed as a framework for automatically tuning the hyperparameters of machine learning models and has been shown to yield state-of-the-art performance with impressive ease and efficiency. In this paper, we explore whether it is possible to transfer the knowledge gained from previous optimizations to new tasks in order to find optimal hyperparameter settings more efficiently. Our approach is based on extending multi-task Gaussian processes to the framework of Bayesian optimization. We show that this method significantly speeds up the optimization process when compared to the standard single-task approach. We further propose a straightforward extension of our algorithm in order to jointly minimize the average error across multiple tasks and demonstrate how this can be used to greatly speed up k-fold cross-validation. Lastly, we propose an adaptation of a recently developed acquisition function, entropy search, to the cost-sensitive, multi-task setting. We demonstrate the utility of this new acquisition function by leveraging a small dataset to explore hyper-parameter settings for a large dataset. Our algorithm dynamically chooses which dataset to query in order to yield the most information per unit cost.

References

Page 1

	Year	Citations

Page 1