Publication | Closed Access
Ad-hoc data processing in the cloud
70
Citations
8
References
2008
Year
Cluster ComputingEngineeringStream ProcessorBig Data IndexingCloud CommunicationMap-reduceData ScienceData-intensive PlatformManagementData IntegrationInternet Of ThingsCloud Data ManagementParallel ComputingData ManagementAd-hoc Data ProcessingCloud-based IntegrationDistributed Data ManagementMobile ComputingComputer ScienceIncremental IndexingData ProcessingCloud ComputingParallel ProgrammingMassive Data ProcessingBig Data
Ad-hoc data processing has proven to be a critical paradigm for Internet companies processing large volumes of unstructured data. However, the emergence of cloud-based computing, where storage and CPU are outsourced to multiple third-parties across the globe, implies large collections of highly distributed and continuously evolving data. Our demonstration combines the power and simplicity of the MapReduce abstraction with a wide-scale distributed stream processor, Mortar. While our incremental MapReduce operators avoid data re-processing, the stream processor manages the placement and physical data flow of the operators across the wide area. We demonstrate a distributed web indexing engine against which users can submit and deploy continuous MapReduce jobs. A visualization component illustrates both the incremental indexing and index searches in real time.
| Year | Citations | |
|---|---|---|
Page 1
Page 1