Publication | Closed Access
PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation
479
Citations
15
References
2024
Year
Unknown Venue
EngineeringMachine LearningAdvanced ComputingCompiler TechnologyMachine Learning ToolComputer ArchitectureFx GraphGraph CompilationHardware SystemsSoftware AnalysisGraph ProcessingGpu ComputingData ScienceComputing SystemsEager Mode FrameworksParallel ComputingCompilersCode GenerationComputer EngineeringComputer ScienceCode RepresentationPytorch 2Torch.compile FeatureHardware AccelerationGraph TheoryProgram AnalysisParallel ProgrammingGraph Neural Network
This paper introduces two extensions to the popular PyTorch machine learning framework, TorchDynamo and TorchInductor, which implement the torch.compile feature released in PyTorch 2. TorchDynamo is a Python-level just-in-time (JIT) compiler that enables graph compilation in PyTorch programs without sacrificing the flexibility of Python. It achieves this by dynamically modifying Python bytecode before execution and extracting sequences of PyTorch operations into an FX graph, which is then JIT compiled using one of many extensible backends. TorchInductor is the default compiler backend for TorchDynamo, which translates PyTorch programs into OpenAI's Triton for GPUs and C++ for CPUs. Results show that TorchDynamo is able to capture graphs more robustly than prior approaches while adding minimal overhead, and TorchInductor is able to provide a 2.27× inference and 1.41× training geometric mean speedup on an NVIDIA A100 GPU across 180+ real-world models, which outperforms six other compilers. These extensions provide a new way to apply optimizations through compilers in eager mode frameworks like PyTorch.
| Year | Citations | |
|---|---|---|
Page 1
Page 1