Publication | Closed Access
Radix-4 FFT implementation using SIMD multimedia instructions
19
Citations
3
References
1999
Year
Unknown Venue
Consecutive 4EngineeringSimd Multimedia InstructionsHardware AccelerationMultimedia Signal ProcessingHigh-performance ArchitectureVideo Coding FormatMultimedia ProcessorHardware AlgorithmComputer EngineeringComputer ArchitectureSymmetrical RoundingParallel ImplementationParallel ProgrammingV830r ProcessorParallel Computing
A fast radix-4 complex FFT implementation using 4-parallel SIMD instructions is presented. Four radix-4 butterflies are calculated in parallel at all stages by loading consecutive 4 elements into a register. At the last stage, every 4 elements is packed into a register and calculated in parallel. This regular data flow enables higher parallelism and an overhead reduction in data format conversion. The implementation result on the V830R processor, which has a 4-parallel SIMD-type multimedia instruction set, achieves practical performance quite competitive with high-end parallel DSPs. Multiply-accumulate instructions with symmetrical rounding introduced to the V830R processor are effective to maintain FFT accuracy.
| Year | Citations | |
|---|---|---|
Page 1
Page 1