Publication | Closed Access
LWRpro: An Energy-Efficient Configurable Crypto-Processor for Module-LWR
64
Citations
22
References
2021
Year
Hardware SecurityOnly Module-learningEngineeringHardware AccelerationAdvanced ComputingHigh-performance ArchitectureEncapsulation OperationsHardware AlgorithmTruncated MultipliersComputer EngineeringComputer ArchitectureLightweight CryptographyEnergy-efficient Configurable Crypto-processorComputer ScienceReconfigurable ArchitectureParallel ComputingCryptography
Saber, the only module-learning with rounding-based algorithm in NIST's third round of post-quantum cryptography (PQC) standardization process, is characterized by simplicity and flexibility. However, energy-efficient implementation of Saber is still under investigation since the commonly used number theoretic transform can not be utilized directly. In this manuscript, an energy-efficient configurable crypto-processor supporting multi-security-level key encapsulation mechanism of Saber, is proposed. First, an 8-level hierarchical Karatsuba framework is utilized to reduce degree-256 polynomial multiplication to the coefficient-wise multiplication. Second, a hardware-efficient Karatsuba scheduling strategy and an optimized pre-/post-processing structure is designed to reduce the area overheads of scheduling strategy. Third, a task-rescheduling-based pipeline strategy and truncated multipliers are proposed to enable fine-grained processing. Moreover, multiple parameter sets are supported in LWRpro to enable configurability among various security scenarios. Enabled by these optimizations, LWRpro requires 1066, 1456 and 1701 clock cycles for key generation, encapsulation, and decapsulation of Saber768. The post-layout version of LWRpro is implemented with TSMC 40 nm CMOS process within 0.38 mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> . The throughput for Saber768 is up to 275k encapsulation operations per second and the energy efficiency is 0.15 uJ/encapsulation while operating at 400 MHz, achieving nearly 50× improvement and 31× improvement, respectively compared with current PQC hardware solutions.
| Year | Citations | |
|---|---|---|
Page 1
Page 1