All that glitters is not gold: the case of calibrating hydrological models

Abstract

All that glitters is not gold is one of those universal truths that also applies to hydrology and particularly to the issue of model calibration, where a glittering mathematical optimum is too often mistaken for a hydrological optimum. This commentary aims at underlining the fact that calibration difficulties have not disappeared with the advent of the latest search algorithms. Although it is true that progress on the numerical front has allowed us to quasi-eradicate miscalibration issues, we still too often underestimate the remaining hydrological task: screening mathematical optima to identify those parameter sets that will also work sufficiently outside the calibration period. The calibration process can be looked at as a task of sorting potential parameter sets, just as gold mining can be looked at as one of sorting minerals. For the implementation of this sorting, calibration requires a method (often a search algorithm) and a specific objective function. Similarly, gold miners search river sediments for gold flakes: they use a shovel to dig sand, a classifier to separate stones from gold-bearing sand, and a pan to separate heavy minerals from the sand. In the search for gold, the miner may be lured by fool's gold, i.e. pyrite crystals that glitter like gold but are not by any means of the same worth. Similarly, the hydrologist may be lured by parameter sets that shine over a short calibration period but prove dull when judged either over a longer calibration period or a different validation period. These parameter sets can be considered analogous to fool's gold. A hydrological optimum is what we, as hydrologists, wish to identify through calibration. It is not simply a parameter set that allows maximizing one or more objective functions over the calibration period: it is one that ideally would permit representing the catchment under all possible calibration periods encompassing climate forcings of interest, i.e. one allowing extrapolation. However, search algorithms only provide numerical optima at best, and their level of optimality is, by definition, only guaranteed for the calibration period. Classical examples of miscalibration and overcalibration are widespread in the hydrological literature. Already in their famous study, Johnston and Pilgrim (1973) related the numerous disappointments caused by an extensive search for the optimum values of the parameters of Boughton's model. They listed all the problems that have since been recognized as the major impediments to the calibration of hydrological models (discontinuities of the response surface, multiplicity of equifinal solutions, unidentifiability, lack of robustness of calibrated parameter values…). More recently, Berthet et al. (2010) have showed how a small number of large events can have a major impact on the criterion value and therefore on the identification of the optimum parameter set. Because of the presence of noise During the calibration process, the model may not only digest the time-invariant specificities of catchment behaviour but also some of the time-varying noise existing in the observed time series. As a consequence, the parameter set identified by calibration may also be representative of the characteristics of the noise and thus lack robustness. Because of lack of information We never observe the catchment over the whole range of possible climatic situations. Our calibration period is always shorter than we wished it was. Some of the functions of the catchment and hence the model may thus not be significantly activated during this period. As an extreme example, consider the parameters of a snow routine, part of a generic hydrological model. For many catchments in the warmer part of the world, a significant snow cover will not occur every year. Thus, if calibrated on a period lacking sufficient snowfall, the parameters of the snow routine will take erratic values and be poorly representative of the long-term behaviour of the catchment. In the systems theory literature, this situation is referred to as an ‘insufficient excitation of the system's modes’, which is known to disturb the model identification process (Ljung, 1998). Because of structural problems The structure of the hydrological model has an impact on the abovementioned problems. A model of a complex nonengineered system is always an imperfect representation, and there is no alternative to the structure containing a certain level of aggregation of physical processes as well as of time and space scales. Thus, it would be unrealistic to expect to escape parameter identification problems entirely. At the limit, however, if provided with a sufficiently long time series that allows the activation of all of its processes, the ideal model should have optimal parameters independent from the calibration period. Although none of our models is ‘ideal’ in that sense, we know that some are less ideal than others: the fact that structural problems are widespread does not mean that we cannot avoid them in large part by choosing a sound model structure (Michel et al., 2006). A particular attention should be given to the number of parameters (Perrin et al., 2001). Jakeman and Hornberger (1993) suggested that the maximum number of parameters that can usually be identified is much lower (4–6) than what is found in most hydrological models. Changing the objective function used in calibration Some authors have proposed addressing the overcalibration problem by changing the objective (or cost) function on which optimization is conducted. Although from a rigorous statistical point of view model calibration should include an analysis of the structure of model errors, most modellers trust standard criteria based on squared errors (typically the RMSE). Different objective functions have been proposed in the search for robustness (Sorooshian et al., 1983; Yang et al., 2007; Thyer et al., 2009; Schoups and Vrugt, 2010). Oudin et al. (2006) discussed the merit of the Nash–Sutcliffe criterion computed on a square root transformation of flows, which they compared with a multi-objective calibration scheme, whereas Gupta et al. (2009) discussed a decomposition of the Nash–Sutcliffe criterion, proposing an alternative that should give more robust parameter estimates. Several authors also advocated multi-objective strategies, about which a large literature now exists (see e.g. Efstratiadis and Koutsoyiannis, 2010; Vrugt et al., 2003), or empirical objective functions aimed at reproducing human expertise (Ehret and Zehe, 2011; Ewen, 2011). Ensemble approaches: replacing the estimation of an optimum value by the estimation of a statistical distribution As an alternative approach to the difficulties of model parameterisation, several hydrologists have suggested abandoning the concept of optimal value and estimating either a family of parameter sets (Beven, 1993; Bardossy and Singh, 2008) or a statistical distribution of possible parameter values (see e.g. Thiemann et al., 2001; Thyer et al., 1999). This approach considers parameter sets as random variables that can be characterized by a distribution, which makes sense from a statistical point of view. Note however that this approach, often Bayesian, will not solve all problems: equifinal parameter sets will not disappear… they will just transmute into a characterisation of the parameters as multimodal distributions! Guided calibration approaches: looking outside of the rainfall-runoff time series for complementary constraints Quite surprisingly, promoters of guided calibration approaches have been coming from two apparently opposite directions: Note that guided calibration approaches can naturally be given a Bayesian interpretation, with the prior parameter distribution being seen as the main guide. Questioning of the model structure Approaches consisting of questioning the structure of a model are more difficult to find in the literature: this is something mostly performed in the initial stages of model development, and modellers thus rarely write about it. It is, however, sometimes mentioned en passant, i.e. for example, by Johnston and Pilgrim (1973) who, at the very end of their calibration study, mentioned that one of the solutions to the numerous problems they had listed could be to ‘review the structure of the model’ (p. 135). Jakeman and Hornberger (1993) insisted on our unavoidably limited capacity to identify parameters, suggesting that it could be impossible to identify more than four to six parameters in a rainfall-runoff model. More recently, some hydrologists have been raising the question of the responsibility of model structures for the existence of secondary optima (Kavetski and Kuczera, 2007; Kavetski and Clark, 2010), suggesting that before casting the stone on the optimization algorithm, modellers should improve the numerical representation of their model, while others have argued for adapting the model structure to each new catchment on which a model is to be applied to (Fenicia et al., 2008). A softer way to question model structure consists in discussing its strengths and weaknesses, and looking for an explicit characterization of the conditions under which a model performs adequately and poorly. Wagener et al. (2003) have tried it with their ‘dynamic identifiability analysis’ method that depicts parameter variations through time as an aid to model improvement. Young (2011) has stressed the virtues of recursive time series methods for indicating model parameter variation and hence model structure inadequacy. It seems, however, that one can go a long way by first identifying what a model is good at and what is not so good at. This would involve not only assessing which parts of the hydrograph are predicted well but also how the model performs under different types of conditions (on this topic, see the package presented by Andrews et al., 2011). This commentary has attempted to highlight the difference existing between miscalibration and overcalibration in hydrological modelling. It has reviewed some of the major solutions, successively proposed over the last few decades. For many years, hydrologists have been focusing on the miscalibration issue, and research has focused mostly on numerical methods. One could say that secondary optima have sometimes been the trees for which many hydrologists have been unable to see the forest. Today, miscalibration has been solved for most models, and the effect of overcalibration is more apparent. Research is still needed on the solutions listed previously, either separately or in combination, to ensure that our mathematically optimal parameter sets are also hydrologically optimal. We would like to thank Prof. Tony Jakeman (ANU, Canberra) for the useful comments he made on this manuscript.

References

Page 1

	Year	Citations

Page 1