Publication | Open Access
Big Data in HEP: A comprehensive use case study
12
Citations
3
References
2017
Year
Experimental Particle Physics has been at the forefront of analyzing the\nworlds largest datasets for decades. The HEP community was the first to develop\nsuitable software and computing tools for this task. In recent times, new\ntoolkits and systems collectively called Big Data technologies have emerged to\nsupport the analysis of Petabyte and Exabyte datasets in industry. While the\nprinciples of data analysis in HEP have not changed (filtering and transforming\nexperiment-specific data formats), these new technologies use different\napproaches and promise a fresh look at analysis of very large datasets and\ncould potentially reduce the time-to-physics with increased interactivity. In\nthis talk, we present an active LHC Run 2 analysis, searching for dark matter\nwith the CMS detector, as a testbed for Big Data technologies. We directly\ncompare the traditional NTuple-based analysis with an equivalent analysis using\nApache Spark on the Hadoop ecosystem and beyond. In both cases, we start the\nanalysis with the official experiment data formats and produce publication\nphysics plots. We will discuss advantages and disadvantages of each approach\nand give an outlook on further studies needed.\n
| Year | Citations | |
|---|---|---|
Page 1
Page 1