You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am looking for some feedback on the accuracy of the SIFT database used in ann-benchmark here. Since it has been used for benchmarking, I assumed it to be quite accurate but found some discrepancies and hence requesting your inputs.
A brief background, we are using ann-benchmarks to test the implementation of our new vector database which uses SIFT. However, we are noticing discrepancies between the ground truth provided by the SIFT dataset and our results. Just so you know, we are also using the square Euclidean distance as specified by the dataset.
Here’s an example from the SIFT 10K sample dataset The ground truth for the test vector at index 0, according to the SIFT dataset, is (a vector of approximate nearest neighbors, sorted from closest to farthest):
As you can see, some elements in our results do not follow the same order. For example, SIFT indicates that vector 370 is closer than vector 2776, but our implementation suggests otherwise.
To rule out possible errors in our implementation, we cross-verified the results using hnswlib. Here’s the output from hnswlib, which aligns with our implementation:
We also printed the distances, which confirm that vector 370 is farther than vector 2776 in our calculations. However, the SIFT ground truth suggests the opposite:
E1812-145923-905 (2488793344): index 46 id 598 distance: 0.432297
E1812-145923-905 (2488793344): index 47 id 2776 distance: 0.432492
E1812-145923-905 (2488793344): index 48 id 370 distance: 0.433172
E1812-145923-905 (2488793344): index 49 id 121 distance: 0.434176
E1812-145923-905 (2488793344): index 50 id 7245 distance: 0.435382
I was just wondering if you have noticed similar discrepancies with the SIFT dataset ground truth, or if we are doing something wrong.
We will appreciate your insights.
The text was updated successfully, but these errors were encountered:
I am looking for some feedback on the accuracy of the SIFT database used in ann-benchmark here. Since it has been used for benchmarking, I assumed it to be quite accurate but found some discrepancies and hence requesting your inputs.
A brief background, we are using ann-benchmarks to test the implementation of our new vector database which uses SIFT. However, we are noticing discrepancies between the ground truth provided by the SIFT dataset and our results. Just so you know, we are also using the square Euclidean distance as specified by the dataset.
Here’s an example from the SIFT 10K sample dataset The ground truth for the test vector at index 0, according to the SIFT dataset, is (a vector of approximate nearest neighbors, sorted from closest to farthest):
As per our implementation
As you can see, some elements in our results do not follow the same order. For example, SIFT indicates that vector 370 is closer than vector 2776, but our implementation suggests otherwise.
To rule out possible errors in our implementation, we cross-verified the results using hnswlib. Here’s the output from hnswlib, which aligns with our implementation:
We also printed the distances, which confirm that vector 370 is farther than vector 2776 in our calculations. However, the SIFT ground truth suggests the opposite:
I was just wondering if you have noticed similar discrepancies with the SIFT dataset ground truth, or if we are doing something wrong.
We will appreciate your insights.
The text was updated successfully, but these errors were encountered: