-
Notifications
You must be signed in to change notification settings - Fork 688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MB-59102: Merging KNN Results #1910
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CascadingRadium would you compress the knn_dataset and knn_queries you've added here.
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved comments, looks good to me.
searchers := make(OrderedSearcherList, len(qsearchers)) | ||
sortedSearchers := &OrderedSearcherList{ | ||
searchers: make([]search.Searcher, len(qsearchers)), | ||
index: make([]int, len(qsearchers)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the purpose for this index
field, but I think you'll need to do this for conjunction also once we support the and
operator over knn?
And why not disjunction heap searcher?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i had put the conjunction stuff in a different branch thinking i will put it out once the operator code was merged in, otherwise it will be unused code. But yea i will just put it here now itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disj heap searcher does not sort the searchers and hence original positions are retained in matchingIdx - an existing variable so i didnt need to do this operation there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See in-line comments, also make sure you do a git pull
before you push up commits here.
6b903a4
to
0ef2e33
Compare
search/scorer/scorer_knn.go
Outdated
@@ -60,6 +60,9 @@ func NewKNNQueryScorer(queryVector []float32, queryField string, queryBoost floa | |||
} | |||
} | |||
|
|||
// TODO: Better value needed here? | |||
const maxEuclideanDistance = 10000.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotta update this based on our conversation. It seems you'll need the adjust the test expectations need to be adjusted as well based on a new value here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea for some reason the score seems to be 0.0 for dot product also so what to do then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, since i'll need this to update my UT for the build going out today, this is what we'll do for score = 0:
- Euclidean dist: set it to max score and don't invert the distance.
- Dot product: don't change the score, let it remain as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me other than the comments Abhinav had.
// tf-idf scoring | ||
score = 1.0 / score | ||
} | ||
|
||
// if the query weight isn't 1, multiply | ||
if sqs.queryWeight != 1.0 { | ||
if sqs.queryWeight != 1.0 && score != maxKNNScore { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than avoiding the score overflowing, any other reason to avoid boosting here?
… hits (#1936) - Reverts commit #1910, which was an earlier attempt to address this issue. - Implements the PreSearch Construct in Bleve alias search, enabling a preliminary query to collect metadata from all alias indexes before executing the main search query in MultiSearch. PreSearch gathers KNN results from all alias indexes, selecting the top K results. This facilitates the main Bleve Query to operate within the context of documents that matched the KNN query, ensuring seamless functionality of existing Bleve constructs such as Faceting, Sorting, Pagination, SearchAfter, and SearchBefore. - Introduces the KNN Collector construct to merge and obtain accurate Top K results from multiple Zap Segments' KNN results. - Enhances KNN Unit Tests for greater generality. - Addresses an issue where errors generated within the Top N Document handler were being discarded. - Resolves an issue where document matches failing to meet the SearchAfter clause weren't being returned to the pool.
Jira
MB-59102
Description
When performing a MultiSearch across an alias representing a partitioned index, we need special logic to merge the KNN results together into the final search result.