Publication | Closed Access
Series approximation methods for divide and square root in the Power3/sup TM/ processor
30
Citations
12
References
2003
Year
Unknown Venue
Numerical AnalysisPade ApproximantEngineeringHardware AlgorithmComputer ArchitectureSupercomputer ArchitectureProcessor ArchitectureHardware SecurityNumerical ComputationHigh-performance ArchitectureApproximate ComputingParallel ComputingApproximation TheorySeries Approximation MethodsElectrical EngineeringSquare Root LatencyComputer EngineeringPower3/sup Tm/ ProcessorSquare RootComputer ScienceApproximation AlgorithmsPipeline LatencyPade ApproximationHardware AccelerationApproximation MethodParallel Programming
The Power3 processor is a 64-bit implementation of the PowerPC/sup TM/ architecture and is the successor to the Power2/sup TM/ processor for workstations and servers which require high performance floating point capability. The previous processors used Newton-Raphson algorithms for their implementations of divide and square root. The Power3 processor has a longer pipeline latency, which would substantially increase the latency for these instructions. Instead, new algorithms based on power series approximations were developed which provide significantly better performance than the Newton-Raphson algorithm for this processor. This paper describes the algorithms, and then shows how both the series based algorithms and the Newton-Raphson algorithms are affected by pipeline length. For the Power3, the power series algorithms reduce the divide latency by over 20% and the square root latency by 35%.
| Year | Citations | |
|---|---|---|
Page 1
Page 1