Publication | Open Access
Improved spill code generation for software pipelined loops
28
Citations
26
References
2000
Year
Unknown Venue
Software MaintenanceEngineeringCompiler TechnologyComputer ArchitectureSoftware EngineeringSource Code AnalysisSoftware AnalysisSoftware PipeliningParallel SoftwareFuzzingParallel ComputingCompilersLoop Scheduling TechniqueInstruction-level ParallelismParallelizing CompilerComputer EngineeringSoftware Pipelining TechniqueComputer ScienceSpill Code GenerationOptimizing CompilerStatic Program AnalysisSoftware DesignProgram AnalysisSoftware TestingParallel Programming
Software pipelining is a loop scheduling technique that extracts parallelism out of loops by overlapping the execution of several consecutive iterations. Due to the overlapping of iterations, schedules impose high register requirements during their execution. A schedule is valid if it requires at most the number of registers available in the target architecture. If not, its register requirements have to be reduced either by decreasing the iteration overlapping or by spilling registers to memory. In this paper we describe a set of heuristics to increase the quality of register-constrained modulo schedules. The heuristics decide between the two previous alternatives and define criteria for effectively selecting spilling candidates. The heuristics proposed for reducing the register pressure can be applied to any software pipelining technique. The proposals are evaluated using a register-conscious software pipeliner on a workbench composed of a large set of loops from the Perfect Club benchmark and a set of processor configurations. Proposals in this paper are compared against a previous proposal already described in the literature. For one of these processor configurations and the set of loops that do not fit in the available registers (32), a speed-up of 1.68 and a reduction of the memory traffic by a factor of 0.57 are achieved with an affordable increase in compilation time. For all the loops, this represents a speed-up of 1.38 and a reduction of the memory traffic by a factor of 0.7.
| Year | Citations | |
|---|---|---|
Page 1
Page 1