Publication | Closed Access
Short term performance forecasting in enterprise systems
65
Citations
21
References
2005
Year
Unknown Venue
Cluster ComputingForecasting MethodologyBusiness ForecastingEngineeringMachine LearningMachine Learning ToolBusiness AnalyticsData ScienceData MiningSystems EngineeringQuantitative ManagementPerformance PredictionData ModelingPredictive AnalyticsKnowledge DiscoveryComputer ScienceForecastingPoor PerformanceEnterprise SystemsPredictive MaintenanceBusinessModel MaintenanceProduction ForecastingIndustrial InformaticsBig Data
Enterprise systems generate abundant, complex data that resists human characterization, making data mining techniques appropriate. The study seeks to classify whether an enterprise system will meet performance targets in the next hour using data mining and machine learning, and to evaluate key dimensions for deploying such tools. The authors compare time‑series, Bayesian network, and other data‑mining approaches on real Hewlett‑Packard enterprise system data to forecast performance. Multivariate models outperform univariate ones, accuracy varies with feature classes, and combined‑system models generalize well, enabling automated resource allocation and opportunistic scheduling.
We use data mining and machine learning techniques to predict upcoming periods of high utilization or poor performance in enterprise systems. The abundant data available and complexity of these systems defies human characterization or static models and makes the task suitable for data mining techniques. We formulate the problem as one of classification: given current and past information about the system's behavior, can we forecast whether the system will meet its performance targets over the next hour? Using real data gathered from several enterprise systems in Hewlett-Packard, we compare several approaches ranging from time series to Bayesian networks. Besides establishing the predictive power of these approaches our study analyzes three dimensions that are important for their application as a stand alone tool. First, it quantifies the gain in accuracy of multivariate prediction methods over simple statistical univariate methods. Second, it quantifies the variations in accuracy when using different classes of system and workload features. Third, it establishes that models induced using combined data from various systems generalize well and are applicable to new systems, enabling accurate predictions on systems with insufficient historical data. Together this analysis offers a promising outlook on the development of tools to automate assignment of resources to stabilize performance, (e.g., adding servers to a cluster) and allow opportunistic job scheduling (e.g., backups or virus scans).
| Year | Citations | |
|---|---|---|
Page 1
Page 1