Publication | Closed Access
A Data Placement Strategy for Data-Intensive Scientific Workflows in Cloud
27
Citations
20
References
2015
Year
Unknown Venue
Cluster ComputingEngineeringData Placement StrategyData GridData ScienceData-intensive PlatformManagementData IntegrationParallel ComputingHigh-throughput ComputingData ManagementRuntime StageData ModelingData Center SystemComputer EngineeringWorkflow Management SystemComputer ScienceData-intensive ComputingWorkflow ExecutionScientific Workflow SystemEdge ComputingCloud ComputingParallel ProgrammingBig Data
With the arrival of cloud computing and Big Data, many scientific applications with large amount of data can be abstracted as scientific workflows and running on a cloud environment. Distributing these datasets intelligently can decrease data transfers efficiently during the workflow's execution. In this paper, we proposed a 2- stage data placement strategy. In the initial stage, we cluster the datasets based on their correlation, and allocate these clusters onto data centers. Compared with existing works, we have incorporated the data size into correlation calculation, and have proposed a new type of data correlation for the intermediate data named "the first order conduction correlation". Hence the data transmission cost can be measured more reasonable. In the runtime stage, the re-distribution algorithm can adjust data layout according to the changed factors, and the overhead of re-layout itself has also been measured. Compared with previous work, simulation results show that our proposed strategy can effectively reduce the time consumption of data movements during the workflow execution.
| Year | Citations | |
|---|---|---|
Page 1
Page 1