Publication | Closed Access
HAWQ
48
Citations
28
References
2014
Year
Unknown Venue
Distributed File SystemCluster ComputingFile SystemTransaction ManagementEngineeringDatabase SupportDistributed DatabaseCloud ComputingData IntegrationParallel StorageParallel ComputingDistributed Data StoreData ManagementData ReplicationBig Data
HAWQ, developed at Pivotal, is a massively parallel processing SQL engine sitting on top of HDFS. As a hybrid of MPP database and Hadoop, it inherits the merits from both parties. It adopts a layered architecture and relies on the distributed file system for data replication and fault tolerance. In addition, it is standard SQL compliant, and unlike other SQL engines on Hadoop, it is fully transactional. This paper presents the novel design of HAWQ, including query processing, the scalable software interconnect based on UDP protocol, transaction management, fault tolerance, read optimized storage, the extensible framework for supporting various popular Hadoop based data stores and formats, and various optimization choices we considered to enhance the query performance. The extensive performance study shows that HAWQ is about 40x faster than Stinger, which is reported 35x-45x faster than the original Hive.
| Year | Citations | |
|---|---|---|
Page 1
Page 1