Managing Diversity in Regression Ensembles

TLDR

Ensemble performance is largely driven by diversity, yet for regression ensembles this diversity is formally defined via covariance but is typically encouraged only heuristically. The paper aims to explicitly control ensemble diversity by deriving an error function that balances diversity and accuracy, and to establish a strict upper bound on the penalty coefficient. The authors derive an error function that incorporates the ensemble combination mechanism, relate it to negative correlation learning, and prove a strict upper bound on the penalty coefficient for generalised linear regression models. The proposed methods systematically control the bias–variance–covariance trade‑off, work with any quadratic‑error minimiser, and empirically outperform simple ensembles while rivaling boosting, bagging, mixtures of experts, and Gaussian processes.

Abstract

Ensembles are a widely used and effective technique in machine learning---their success is commonly attributed to the degree of disagreement, or 'diversity', within the ensemble. For ensembles where the individual estimators output crisp class labels, this 'diversity' is not well understood and remains an open research issue. For ensembles of regression estimators, the diversity can be exactly formulated in terms of the covariance between individual estimator outputs, and the optimum level is expressed in terms of a bias-variance-covariance trade-off. Despite this, most approaches to learning ensembles use heuristics to encourage the right degree of diversity. In this work we show how to explicitly control diversity through the error function. The first contribution of this paper is to show that by taking the combination mechanism for the ensemble into account we can derive an error function for each individual that balances ensemble diversity with individual accuracy. We show the relationship between this error function and an existing algorithm called negative correlation learning, which uses a heuristic penalty term added to the mean squared error function. It is demonstrated that these methods control the bias-variance-covariance trade-off systematically, and can be utilised with any estimator capable of minimising a quadratic error function, for example MLPs, or RBF networks. As a second contribution, we derive a strict upper bound on the coefficient of the penalty term, which holds for any estimator that can be cast in a generalised linear regression framework, with mild assumptions on the basis functions. Finally we present the results of an empirical study, showing significant improvements over simple ensemble learning, and finding that this technique is competitive with a variety of methods, including boosting, bagging, mixtures of experts, and Gaussian processes, on a number of tasks.

References

Page 1

	Year	Citations

Page 1