Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some datasets are missing in https://ann-benchmarks.com/${dataset_name}.hdf5 #566

Open
zhuwenxing opened this issue Jan 3, 2025 · 0 comments

Comments

@zhuwenxing
Copy link

DATASETS: Dict[str, Callable[[str], None]] = {
    "deep-image-96-angular": deep_image,
    "fashion-mnist-784-euclidean": fashion_mnist,
    "gist-960-euclidean": gist,
    "glove-25-angular": lambda out_fn: glove(out_fn, 25),
    "glove-50-angular": lambda out_fn: glove(out_fn, 50),
    "glove-100-angular": lambda out_fn: glove(out_fn, 100),
    "glove-200-angular": lambda out_fn: glove(out_fn, 200),
    "mnist-784-euclidean": mnist,
    "random-xs-20-euclidean": lambda out_fn: random_float(out_fn, 20, 10000, 100, "euclidean"),
    "random-s-100-euclidean": lambda out_fn: random_float(out_fn, 100, 100000, 1000, "euclidean"),
    "random-xs-20-angular": lambda out_fn: random_float(out_fn, 20, 10000, 100, "angular"),
    "random-s-100-angular": lambda out_fn: random_float(out_fn, 100, 100000, 1000, "angular"),
    "random-xs-16-hamming": lambda out_fn: random_bitstring(out_fn, 16, 10000, 100),
    "random-s-128-hamming": lambda out_fn: random_bitstring(out_fn, 128, 50000, 1000),
    "random-l-256-hamming": lambda out_fn: random_bitstring(out_fn, 256, 100000, 1000),
    "random-s-jaccard": lambda out_fn: random_jaccard(out_fn, n=10000, size=20, universe=40),
    "random-l-jaccard": lambda out_fn: random_jaccard(out_fn, n=100000, size=70, universe=100),
    "sift-128-euclidean": sift,
    "nytimes-256-angular": lambda out_fn: nytimes(out_fn, 256),
    "nytimes-16-angular": lambda out_fn: nytimes(out_fn, 16),
    "word2bits-800-hamming": lambda out_fn: word2bits(out_fn, "400K", "w2b_bitlevel1_size800_vocab400K"),
    "lastfm-64-dot": lambda out_fn: lastfm(out_fn, 64),
    "sift-256-hamming": lambda out_fn: sift_hamming(out_fn, "sift.hamming.256"),
    "kosarak-jaccard": lambda out_fn: kosarak(out_fn),
    "movielens1m-jaccard": movielens1m,
    "movielens10m-jaccard": movielens10m,
    "movielens20m-jaccard": movielens20m,
}

Are all the datasets in this list available for download?
I found that some of them are missing.

Checking dataset availability from: ann-benchmarks.com
Total datasets to check: 27
-----------------------------------
Checking: deep-image-96-angular
✅ Available: deep-image-96-angular
Checking: fashion-mnist-784-euclidean
✅ Available: fashion-mnist-784-euclidean
Checking: gist-960-euclidean
✅ Available: gist-960-euclidean
Checking: glove-25-angular
✅ Available: glove-25-angular
Checking: glove-50-angular
✅ Available: glove-50-angular
Checking: glove-100-angular
✅ Available: glove-100-angular
Checking: glove-200-angular
✅ Available: glove-200-angular
Checking: mnist-784-euclidean
✅ Available: mnist-784-euclidean
Checking: random-xs-20-euclidean
❌ Not available: random-xs-20-euclidean
Checking: random-s-100-euclidean
❌ Not available: random-s-100-euclidean
Checking: random-xs-20-angular
✅ Available: random-xs-20-angular
Checking: random-s-100-angular
❌ Not available: random-s-100-angular
Checking: random-xs-16-hamming
❌ Not available: random-xs-16-hamming
Checking: random-s-128-hamming
❌ Not available: random-s-128-hamming
Checking: random-l-256-hamming
❌ Not available: random-l-256-hamming
Checking: random-s-jaccard
❌ Not available: random-s-jaccard
Checking: random-l-jaccard
❌ Not available: random-l-jaccard
Checking: sift-128-euclidean
✅ Available: sift-128-euclidean
Checking: nytimes-256-angular
✅ Available: nytimes-256-angular
Checking: nytimes-16-angular
✅ Available: nytimes-16-angular
Checking: word2bits-800-hamming
✅ Available: word2bits-800-hamming
Checking: lastfm-64-dot
✅ Available: lastfm-64-dot
Checking: sift-256-hamming
✅ Available: sift-256-hamming
Checking: kosarak-jaccard
✅ Available: kosarak-jaccard
Checking: movielens1m-jaccard
❌ Not available: movielens1m-jaccard
Checking: movielens10m-jaccard
✅ Available: movielens10m-jaccard
Checking: movielens20m-jaccard
❌ Not available: movielens20m-jaccard
-----------------------------------
Check completed!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant