You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
I’m using the Usearch library to perform semantic search with int8 quantized embeddings and the inner product ("ip") metric. While the results seem correctly ranked by similarity, I noticed that the distances returned by the search are negative. I would like to better understand this behavior and confirm whether it is expected.
Steps to Reproduce
Compute embeddings using the following model:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer(
'mixedbread-ai/deepset-mxbai-embed-de-large-v1',
model_kwargs={"torch_dtype": "float16"}
)
Observed Behavior
The distances returned by the index are negative, even for highly similar embeddings.
For example:
Top result distance: -426537.0
Less similar result distance: -436325.0
Expected Behavior
I expected the distances to be positive or to align more clearly with cosine similarity values in a normalized embedding space.
Questions
Is it expected for the inner product ("ip") metric to produce negative distances when used with int8 quantized embeddings?
Should I interpret less negative distances as higher similarity? Is there a recommended approach to convert these distances into similarity scores (e.g., by inverting the sign)?
Could this behavior be related to quantization ranges or normalization of embeddings before adding them to the index? Additional Information
Embedding Model: mixedbread-ai/deepset-mxbai-embed-de-large-v1
Usearch Index Configuration:
index = Index(ndim=1024, metric="ip", dtype="i8")
Quantization Ranges:
Precomputed global min and max values across the entire dataset:
Description
I’m using the Usearch library to perform semantic search with
int8
quantized embeddings and theinner product ("ip")
metric. While the results seem correctly ranked by similarity, I noticed that the distances returned by the search are negative. I would like to better understand this behavior and confirm whether it is expected.Steps to Reproduce
Compute embeddings using the following model:
Quantize the embeddings with precomputed ranges:
Add the quantized embeddings to a Usearch index:
Perform a query with a normalized embedding:
Observed Behavior
The distances returned by the index are negative, even for highly similar embeddings.
For example:
Top result distance: -426537.0
Less similar result distance: -436325.0
Expected Behavior
I expected the distances to be positive or to align more clearly with cosine similarity values in a normalized embedding space.
Questions
Is it expected for the
inner product ("ip")
metric to produce negative distances when used withint8
quantized embeddings?Should I interpret less negative distances as higher similarity? Is there a recommended approach to convert these distances into similarity scores (e.g., by inverting the sign)?
Could this behavior be related to quantization ranges or normalization of embeddings before adding them to the index?
Additional Information
Embedding Model:
mixedbread-ai/deepset-mxbai-embed-de-large-v1
Usearch Index Configuration:
index = Index(ndim=1024, metric="ip", dtype="i8")
Quantization Ranges:
Precomputed global min and max values across the entire dataset:
The text was updated successfully, but these errors were encountered: