Concepedia

Abstract

In this paper, we present an FPGA based hardware accelerator ‘ <inline-formula><tex-math notation="LaTeX"> $\mathsf{HEPCloud}$</tex-math> </inline-formula> ’ for homomorphic evaluations of medium depth functions which has applications in cloud computing. Our <inline-formula><tex-math notation="LaTeX">$\mathsf{HEPCloud}$</tex-math></inline-formula> architecture supports the polynomial ring based homomorphic encryption scheme FV for a ring-LWE parameter set of dimension <inline-formula> <tex-math notation="LaTeX">$2^{15}$</tex-math></inline-formula> , modulus size 1,228-bit, and a standard deviation 50. This parameter-set offers a multiplicative depth 36 and at least 85 bit security. The processor of <inline-formula><tex-math notation="LaTeX"> $\mathsf{HEPCloud}$</tex-math> </inline-formula> is composed of multiple parallel cores. To achieve fast computation time for such a large parameter-set, various optimizations in both algorithm and architecture levels are performed. For fast polynomial multiplications, we use CRT with NTT and achieve two dimensional parallelism in <inline-formula> <tex-math notation="LaTeX">$\mathsf{HEPCloud}$</tex-math></inline-formula> . We optimize the BRAM access, use a fast Barrett like polynomial reduction method, optimize the cost of CRT, and design a fast divide-and-round unit. Beside parallel processing, we apply pipelining strategy in several of the sequential building blocks to reduce the impact of sequential computations. Finally, we implement <inline-formula><tex-math notation="LaTeX"> $\mathsf{HEPCloud}$</tex-math> </inline-formula> on a medium-size Xilinx Virtex 6 FPGA board ML605 board and measure its on-board performance. To store the ciphertexts during a homomorphic function evaluation, we use the large DDR3 memory of the ML605 board. Our FPGA-based implementation of <inline-formula><tex-math notation="LaTeX">$\mathsf{HEPCloud}$</tex-math></inline-formula> computes a homomorphic multiplication in 26.67 s, of which the actual computation takes only 3.36 s and the rest is spent for off-chip memory access. It requires about 37,551 s to evaluate the SIMON-64/128 block cipher, but the per-block timing is only about 18 s because <inline-formula><tex-math notation="LaTeX">$\mathsf{HEPCloud}$</tex-math> </inline-formula> processes 2,048 blocks simultaneously. The results show that FPGA-based acceleration of homomorphic function evaluations is feasible, but fast memory interface is crucial for the performance.

References

YearCitations

Page 1