Publication | Closed Access
Large-scale data collection: a coordinated approach
26
Citations
20
References
2004
Year
Unknown Venue
Cluster ComputingEngineeringNetwork RoutingNetwork AnalysisData Collection ProblemData ScienceManagementScalable RoutingData IntegrationInformation-centric NetworkingParallel ComputingCombinatorial OptimizationData ManagementCoordinated ApproachHigh-performance Data AnalyticsData ModelingData Collection SchedulesKnowledge DiscoveryDistributed Data ManagementComputer ScienceNetwork Routing AlgorithmNetwork ScienceEdge ComputingNetwork Traffic ControlCloud ComputingParallel ProgrammingBistro PlatformMassive Data ProcessingBig Data
In this paper we consider the problem of collecting a large amount of data from several different hosts to a single destination in a wide-area network. Often, due to congestion conditions, the paths chosen by the network may have poor throughput. By choosing an alternate route at the application level, we may be able to obtain substantially faster completion time. This data collection problem is a nontrivial one because the issue is not only to avoid congested link(s), but to devise a coordinated transfer schedule which would afford maximum possible utilization of available network resources. In this paper we present an approach for computing coordinated data collection schedules, which can result in significant performance improvements. We make no assumptions about knowledge of the topology of the network or the capacity available on individual links of the network, i.e., we only use end-to-end information. Finally, we also study the shortcomings of this approach in terms of the gap between the theoretical formulation and the resulting data transfers in wide-area networks. In general, our approach can be used for solving arbitrary data movement problems over the Internet. We use the Bistro platform to illustrate one application of our techniques.
| Year | Citations | |
|---|---|---|
Page 1
Page 1