Publication | Closed Access
A regular parallel RSA processor
27
Citations
18
References
2004
Year
Unknown Venue
EngineeringVlsi DesignComputer ArchitectureParallel ImplementationParallel AlgorithmsHardware SecurityClock SignalHigh-performance ArchitectureParallel ComputingRsa AlgorithmComputer EngineeringComputer ScienceModular MultipliersFpga DesignHardware AccelerationVlsi ArchitectureParallel ProcessingParallel Performance EvaluationParallel Programming
High performance VLSI implementation of the RSA algorithm using the systolic array is presented. High-speed applications of RSA systems require parallel implementations of modular multipliers. Besides using the systolic architecture which is popular in hardware-based RSA systems, a block-based scheme is used to further eliminate global signals, with a pipelined bus to convey data globally. The control signals and intermediate results used for sequential multiplications are transmitted by shift registers. All signals, except for the clock signal, are limited in one block or between two adjacent blocks. A carry-save-adder structure is used for calculating the iterative step of the algorithm, which contributes to speed improvement and area saving. In addition, long modular multipliers suffer from the effect of large fanout. Novel architectures are proposed to eliminate the fanout bottleneck, which reduce the achievable minimum clock period of long modular multipliers. Compared to the original modular multiplier architecture with fanout bottleneck, the proposed architectures can achieve an increase of over 7% in throughput without increase in area. The Chinese remainder theorem (CRT) technique increases the decryption data rate by a factor of four. Two redundant blocks are added to adapt to the on-line partition of the multiplier and the variation of the length of P and Q in CRT mode.
| Year | Citations | |
|---|---|---|
Page 1
Page 1