Publication | Closed Access
SparkBench
182
Citations
14
References
2015
Year
Unknown Venue
Cluster ComputingEngineeringMachine LearningSpark Specify BenchmarkMap-reduceData ScienceData-intensive PlatformData IntegrationParallel ComputingData ManagementHigh-performance Data AnalyticsComputer EngineeringComputer ScienceData-intensive ComputingMemory AbstractionCloud ComputingParallel ProgrammingMassive Data ProcessingBig Data
Spark has been increasingly adopted by industries in recent years for big data analysis by providing a fault tolerant, scalable and easy-to-use in memory abstraction. Moreover, the community has been actively developing a rich ecosystem around Spark, making it even more attractive. However, there is not yet a Spark specify benchmark existing in the literature to guide the development and cluster deployment of Spark to better fit resource demands of user applications. In this paper, we present SparkBench, a Spark specific benchmarking suite, which includes a comprehensive set of applications. SparkBench covers four main categories of applications, including machine learning, graph computation, SQL query and streaming applications. We also characterize the resource consumption, data flow and timing information of each application and evaluate the performance impact of a key configuration parameter to guide the design and optimization of Spark data analytic platform.
| Year | Citations | |
|---|---|---|
Page 1
Page 1