Publication | Closed Access
Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS
179
Citations
19
References
2009
Year
Unknown Venue
Distributed File SystemCluster ComputingEngineeringParallel StorageMap-reduceSmall FilesData ScienceData IntegrationParallel ComputingParallel File SystemData ManagementFile SystemsComputer ScienceHadoop FrameworkCloud ComputingCase StudyParallel ProgrammingFile SystemBig Data
Hadoop framework has been widely used in various clusters to build large scale, high performance systems. However, Hadoop distributed file system (HDFS) is designed to manage large files and suffers performance penalty while managing a large amount of small files. As a consequence, many web applications, like WebGIS, may not take benefits from Hadoop. In this paper, we propose an approach to optimize I/O performance of small files on HDFS. The basic idea is to combine small files into large ones to reduce the file number and build index for each file. Furthermore, some novel features such as grouping neighboring files and reserving several latest version of data are considered to meet the characteristics of WebGIS access patterns. Preliminary experiment results show that our approach achieves better performance.
| Year | Citations | |
|---|---|---|
Page 1
Page 1