Publication | Closed Access
Analysis and Lessons from a Publicly Available Google Cluster Trace
122
Citations
7
References
2010
Year
Unknown Venue
Cluster ComputingEngineeringBusiness IntelligenceMap-reduceBusiness AnalyticsCluster TechnologyData ScienceData MiningManagementSystems EngineeringData IntegrationWorkload CharacterizationData ManagementQuantitative ManagementPerformance PredictionLarge Scale DataJob SchedulerDistributed Search EngineProfiling ToolKnowledge DiscoveryComputer ScienceInformation ManagementCluster DevelopmentSystem DesignersIndustrial InformaticsAvailable Production DataWorkload ManagementBig Data
System designers in industry are often overwhelmed by large scale data, while researchers in academic often confront a lack of publicly available production data. In this paper, we analyze a large scale production workload trace recently made publicly available by Google. We offer a statistical profile of the data, with several interesting discoveries regarding job arrival patterns, CPU and memory consumptions, task durations, and others. We further perform k-means clustering to identify common groups of jobs, with several methodological departures and different findings compared with prior work on similar data. We also do correlation analysis between job semantics and job behavior, leading to helpful perspectives on capacity planning and system tuning. Our key finding is that while the limited dataset size prevents us from generalizing the trace behaviors observed, the analytical methods we describe nonetheless allow us to extract many system design insights.
| Year | Citations | |
|---|---|---|
Page 1
Page 1