Publication | Open Access
Hadoop Performance Models
112
Citations
0
References
2011
Year
Cluster ComputingEngineeringComputer ArchitectureMap-reduceDistributed Data AnalyticsData ScienceData IntegrationParallel ComputingData ManagementHigh-performance Data AnalyticsHadoop MapreduceMapreduce JobsComputer ScienceData-intensive ComputingScalable ComputingPerformance ScalabilityCloud ComputingMapreduce JobParallel ProgrammingHadoop Performance ModelsMassive Data ProcessingBig Data
Hadoop MapReduce is now a popular choice for performing large-scale data analytics. This technical report describes a detailed set of mathematical performance models for describing the execution of a MapReduce job on Hadoop. The models describe dataflow and cost information at the fine granularity of phases within the map and reduce tasks of a job execution. The models can be used to estimate the performance of MapReduce jobs as well as to find the optimal configuration settings to use when running the jobs.