iDEC: Indexable Distance Estimating Codes for Approximate Nearest Neighbor Search

Very Large Databases Endowment logo

iDEC: Indexable Distance Estimating Codes for Approximate Nearest Neighbor…

Approximate Nearest Neighbor (ANN) search is a fundamental algorithmic problem, with numerous applications in many areas of computer science. In this work, we propose indexable distance estimating codes (iDEC), a new solution framework to ANN that extends and improves the locality sensitive hashing (LSH) framework in a fundamental and systematic way. Empirically, an iDEC-based solution has a low index space complexity of O(n) and can achieve a low average query time complexity of approximately O(log n). We show that our iDEC-based solutions for ANN in Hamming and edit distances outperform the respective state-of-the-art LSH-based solutions for both in-memory and external-memory processing. We also show that our iDEC-based in-memory ANN-H solution is more scalable than all existing solutions. We also discover deep connections between Error-Estimating Codes (EEC), LSH, and iDEC.

Long Gong, Huayi Wang, Mitsunori Ogihara, and Jun Xu. 2020. IDEC: indexable distance estimating codes for approximate nearest neighbor search. Proc. VLDB Endow. 13, 9 (May 2020), 1483–1497. DOI:https://doi.org/10.14778/3397230.3397243