ANNA: Specialized Architecture for Approximate Nearest Neighbor Search

TLDR

Nearest‑neighbor search retrieves vectors most similar to a query, a core operation increasingly critical for modern recommender systems and semantic search engines that use high‑dimensional embeddings, yet exhaustive search over billions of vectors is prohibitively expensive, so approximate nearest‑neighbor search is widely adopted. This work introduces ANNA, a specialized accelerator designed to overcome the performance and energy inefficiencies of server‑class CPUs and GPUs for approximate nearest‑neighbor search. ANNA implements a dataflow pipeline that efficiently reuses data and is compatible with leading ANNS algorithms such as Google ScaNN and Facebook Faiss. On million‑ and billion‑scale datasets, ANNA delivers 2.3–61.6× higher throughput, 4.3–82.1× lower latency, and multiple orders of magnitude better energy efficiency compared to conventional CPU or GPU implementations.

Abstract

Similarity search or nearest neighbor search is a task of retrieving a set of vectors in the (vector) database that are most similar to the provided query vector. It has been a key kernel for many applications for a long time. However, it is becoming especially more important in recent days as modern neural networks and machine learning models represent the semantics of images, videos, and documents as high-dimensional vectors called embeddings. Finding a set of similar embeddings for the provided query embedding is now the critical operation for modern recommender systems and semantic search engines. Since exhaustively searching for the most similar vectors out of billion vectors is such a prohibitive task, approximate nearest neighbor search (ANNS) is often utilized in many real-world use cases. Unfortunately, we find that utilizing the server-class CPUs and GPUs for the ANNS task leads to suboptimal performance and energy efficiency. To address such limitations, we propose a specialized architecture named ANNA (Approximate Nearest Neighbor search Accelerator), which is compatible with state-of-the-art ANNS algorithms such as Google ScaNN and Facebook Faiss. By combining the benefits of a specialized dataflow pipeline and efficient data reuse, ANNA achieves multiple orders of magnitude higher energy efficiency, 2.3-61.6× higher throughput, and 4.3-82.1× lower latency than the conventional CPU or GPU for both million- and billion-scale datasets.

References

Page 1

	Year	Citations

Page 1