Publication | Open Access
TensorOpt: Exploring the Tradeoffs in Distributed DNN Training With Auto-Parallelism
38
Citations
38
References
2021
Year
Artificial IntelligenceEngineeringMachine LearningDistributed DnnEducationDistributed Ai SystemParallel AlgorithmsData ScienceMulti-task LearningParallel ComputingPerformance ImprovementNetwork FlowsComputer EngineeringLarge Scale OptimizationComputer ScienceDistributed LearningDeep LearningParallel LearningParallel ProgrammingExecution TimeEffective Parallelization StrategiesParallelization StrategiesResource Optimization
Effective parallelization strategies are crucial for the performance of distributed deep neural network (DNN) training. Recently, several methods have been proposed to search parallelization strategies but they all optimize a single objective (e.g., execution time, memory consumption) and produce only one strategy. We propose <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Frontier Tracking</i> (FT), an efficient algorithm that finds <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">a set of Pareto-optimal parallelization strategies</i> to explore the best trade-off among different objectives. FT can minimize the memory consumption when the number of devices is limited and fully utilize additional resources to reduce the execution time. Based on <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">FT</i> , we develop a user-friendly system, called <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">TensorOpt</i> , which allows users to run their distributed DNN training jobs without caring the details about searching and coding parallelization strategies. Experimental results show that TensorOpt is more flexible in adapting to resource availability compared with existing frameworks.
| Year | Citations | |
|---|---|---|
Page 1
Page 1