Publication | Closed Access
The Hadoop distributed filesystem: Balancing portability and performance
295
Citations
17
References
2010
Year
Unknown Venue
Distributed File SystemCluster ComputingStorage PerformanceEngineeringArchitectural BottlenecksComputer ArchitectureParallel StorageBalancing PortabilityMap-reducePerformance BottlenecksData SciencePortability LimitationsParallel ComputingParallel File SystemData ManagementFile SystemsDistributed SystemsCloud ComputingSystem SoftwareBig Data
Hadoop, an open‑source MapReduce framework, employs the Java‑based HDFS to provide a portable distributed filesystem across heterogeneous hardware and software platforms. This study examines HDFS performance, identifies bottlenecks, and evaluates the trade‑offs between portability and efficiency. The authors find that scheduling delays create architectural bottlenecks, Java’s portability limits exploitation of native platform features, and HDFS’s assumptions about native storage management lead to inefficiencies across diverse systems.
Hadoop is a popular open-source implementation of MapReduce for the analysis of large datasets. To manage storage resources across the cluster, Hadoop uses a distributed user-level filesystem. This filesystem - HDFS - is written in Java and designed for portability across heterogeneous hardware and software platforms. This paper analyzes the performance of HDFS and uncovers several performance issues. First, architectural bottlenecks exist in the Hadoop implementation that result in inefficient HDFS usage due to delays in scheduling new MapReduce tasks. Second, portability limitations prevent the Java implementation from exploiting features of the native platform. Third, HDFS implicitly makes portability assumptions about how the native platform manages storage resources, even though native filesystems and I/O schedulers vary widely in design and behavior. This paper investigates the root causes of these performance bottlenecks in order to evaluate tradeoffs between portability and performance in the Hadoop distributed filesystem.
| Year | Citations | |
|---|---|---|
Page 1
Page 1