Publication | Closed Access
Understanding query performance in Accumulo
17
Citations
6
References
2013
Year
Unknown Venue
Cluster ComputingEngineeringStorage SchemaAggregate FunctionInformation RetrievalData ScienceDatabase SupportManagementKeyvalue DatabaseData IntegrationData ManagementPractical QueriesComputer ScienceDistributed Query ProcessingValue IndexingQuery OptimizationQuery PerformanceEdge ComputingCloud ComputingApproximate Query AnsweringDistributed Data StoreBig Data
Open-source, BigTable-like distributed databases provide a scalable storage solution for data-intensive applications. The simple key-value storage schema provides fast record ingest and retrieval, nearly independent of the quantity of data stored. However, real applications must support non-trivial queries that require careful key design and value indexing. We study an Apache Accumulo-based big data system designed for a network situational awareness application. The application's storage schema and data retrieval requirements are analyzed. We then characterize the corresponding Accumulo performance bottlenecks. Queries are shown to be communication-bound and server-bound in different situations. Inefficiencies in the open-source communication stack and filesystem limit network and I/O performance, respectively. Additionally, in some situations, parallel clients can contend for server-side resources. Maximizing data retrieval rates for practical queries requires effective key design, indexing, and client parallelization.
| Year | Citations | |
|---|---|---|
2008 | 3.4K | |
2009 | 1.1K | |
2011 | 176 | |
2012 | 105 | |
2010 | 50 | |
2012 | 42 |
Page 1
Page 1