Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MB-59102: Merging KNN Results #1910

Merged
merged 11 commits into from
Nov 21, 2023
Merged

MB-59102: Merging KNN Results #1910

merged 11 commits into from
Nov 21, 2023

Conversation

CascadingRadium
Copy link
Member

Jira

MB-59102

Description

When performing a MultiSearch across an alias representing a partitioned index, we need special logic to merge the KNN results together into the final search result.

@CascadingRadium CascadingRadium changed the title MB-59102 MB-59102: Merging KNN Results Nov 16, 2023
@CascadingRadium CascadingRadium self-assigned this Nov 16, 2023
@CascadingRadium CascadingRadium added this to the v2.4.0 milestone Nov 16, 2023
@abhinavdangeti abhinavdangeti removed this from the v2.4.0 milestone Nov 17, 2023
Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CascadingRadium would you compress the knn_dataset and knn_queries you've added here.

@CascadingRadium
Copy link
Member Author

done

search_knn.go Show resolved Hide resolved
index_alias_impl.go Outdated Show resolved Hide resolved
search_knn.go Outdated Show resolved Hide resolved
search_no_knn.go Outdated Show resolved Hide resolved
search/searcher/search_knn_util.go Outdated Show resolved Hide resolved
search/searcher/search_no_knn_util.go Outdated Show resolved Hide resolved
search/util.go Outdated Show resolved Hide resolved
Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved comments, looks good to me.

searchers := make(OrderedSearcherList, len(qsearchers))
sortedSearchers := &OrderedSearcherList{
searchers: make([]search.Searcher, len(qsearchers)),
index: make([]int, len(qsearchers)),
Copy link
Member

@abhinavdangeti abhinavdangeti Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the purpose for this index field, but I think you'll need to do this for conjunction also once we support the and operator over knn?

And why not disjunction heap searcher?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i had put the conjunction stuff in a different branch thinking i will put it out once the operator code was merged in, otherwise it will be unused code. But yea i will just put it here now itself

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disj heap searcher does not sort the searchers and hence original positions are retained in matchingIdx - an existing variable so i didnt need to do this operation there

Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See in-line comments, also make sure you do a git pull before you push up commits here.

@@ -60,6 +60,9 @@ func NewKNNQueryScorer(queryVector []float32, queryField string, queryBoost floa
}
}

// TODO: Better value needed here?
const maxEuclideanDistance = 10000.0
Copy link
Member

@abhinavdangeti abhinavdangeti Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotta update this based on our conversation. It seems you'll need the adjust the test expectations need to be adjusted as well based on a new value here.

Copy link
Member Author

@CascadingRadium CascadingRadium Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea for some reason the score seems to be 0.0 for dot product also so what to do then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, since i'll need this to update my UT for the build going out today, this is what we'll do for score = 0:

  1. Euclidean dist: set it to max score and don't invert the distance.
  2. Dot product: don't change the score, let it remain as is.

Thejas-bhat
Thejas-bhat previously approved these changes Nov 21, 2023
Copy link
Member

@Thejas-bhat Thejas-bhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me other than the comments Abhinav had.

@abhinavdangeti abhinavdangeti merged commit f919433 into unstable Nov 21, 2023
0 of 9 checks passed
// tf-idf scoring
score = 1.0 / score
}

// if the query weight isn't 1, multiply
if sqs.queryWeight != 1.0 {
if sqs.queryWeight != 1.0 && score != maxKNNScore {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than avoiding the score overflowing, any other reason to avoid boosting here?

@abhinavdangeti abhinavdangeti deleted the MB-59102 branch December 4, 2023 22:19
abhinavdangeti pushed a commit that referenced this pull request Dec 15, 2023
… hits (#1936)

- Reverts commit #1910, which
was an earlier attempt to address this issue.
- Implements the PreSearch Construct in Bleve alias search, enabling a
preliminary query to collect metadata from all alias indexes before
executing the main search query in MultiSearch. PreSearch gathers KNN
results from all alias indexes, selecting the top K results. This
facilitates the main Bleve Query to operate within the context of
documents that matched the KNN query, ensuring seamless functionality of
existing Bleve constructs such as Faceting, Sorting, Pagination,
SearchAfter, and SearchBefore.
- Introduces the KNN Collector construct to merge and obtain accurate
Top K results from multiple Zap Segments' KNN results.
- Enhances KNN Unit Tests for greater generality.
- Addresses an issue where errors generated within the Top N Document
handler were being discarded.
- Resolves an issue where document matches failing to meet the
SearchAfter clause weren't being returned to the pool.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants