Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MB-59102: Merging kNN hits from a distributed index with exact search hits #1936

Merged
merged 5 commits into from
Dec 15, 2023

Conversation

CascadingRadium
Copy link
Member

  • Reverts commit MB-59102: Merging KNN Results #1910, which was an earlier attempt to address this issue.
  • Implements the PreSearch Construct in Bleve alias search, enabling a preliminary query to collect metadata from all alias indexes before executing the main search query in MultiSearch. PreSearch gathers KNN results from all alias indexes, selecting the top K results. This facilitates the main Bleve Query to operate within the context of documents that matched the KNN query, ensuring seamless functionality of existing Bleve constructs such as Faceting, Sorting, Pagination, SearchAfter, and SearchBefore.
  • Introduces the KNN Collector construct to merge and obtain accurate Top K results from multiple Zap Segments' KNN results.
  • Enhances KNN Unit Tests for greater generality.
  • Addresses an issue where errors generated within the Top N Document handler were being discarded.
  • Resolves an issue where document matches failing to meet the SearchAfter clause weren't being returned to the pool.

@abhinavdangeti abhinavdangeti changed the title Integrate Vector Search into Bleve Merging kNN hits from a distributed index with exact search hits Dec 12, 2023
@abhinavdangeti
Copy link
Member

Thank you @CascadingRadium . I'll share my review by tomorrow.
Meanwhile - @Thejas-bhat , @metonymic-smokey , @moshaad7 , @Likith101 would you please start reviewing this.

@abhinavdangeti
Copy link
Member

@CascadingRadium I tried this change E2E with couchbase. Things don't work on 2 accounts -

  • one, if I don't specify knn_operator, I see unknown knn operator: - this is not right, it needs to default to or when unspecified
  • two, I go ahead and put in knn_operator to "or", a kNN search over that SIFT dataset in combination with match_none, now returns nothing (it seems the response is just empty)

Non-vector indexes are affected as well, so this needs a bit more testing from your end.
Can you look into what's going on? Should be straightforward to reproduce.

@metonymic-smokey
Copy link
Member

@abhinavdangeti Re point 1, I believe this PR is the cause(#1935) since this doesn't allow an empty string as the KNN operator. If it is, it should be reverted.

@abhinavdangeti
Copy link
Member

Thanks @metonymic-smokey , here's the PR adjusting that - #1937

@CascadingRadium
Copy link
Member Author

there's a separate PR in cbft that I will raise today. the marshal unmatshal in logic there needs to be updated

index_alias_impl.go Outdated Show resolved Hide resolved
index_alias_impl.go Outdated Show resolved Hide resolved
index_alias_impl.go Outdated Show resolved Hide resolved
index_alias_impl.go Outdated Show resolved Hide resolved
index_alias_impl.go Outdated Show resolved Hide resolved
index_alias_impl.go Outdated Show resolved Hide resolved
search/util.go Outdated Show resolved Hide resolved
search_knn.go Outdated Show resolved Hide resolved
search_no_knn.go Outdated Show resolved Hide resolved
@abhinavdangeti
Copy link
Member

there's a separate PR in cbft that I will raise today. the marshal unmatshal in logic there needs to be updated

I don't see this still. Are any gRPC changes required in cbft to support this?

Copy link
Member

@Thejas-bhat Thejas-bhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still trying to understand the details of the implementation, although i get the core logic behind it. will keep at it periodically and let you know any concerns i have

index_impl.go Show resolved Hide resolved
search/collector/topn.go Show resolved Hide resolved
@blevesearch blevesearch deleted a comment from CascadingRadium Dec 14, 2023
index_alias_impl.go Outdated Show resolved Hide resolved
search_knn.go Show resolved Hide resolved
search_knn.go Show resolved Hide resolved
search_knn.go Outdated Show resolved Hide resolved
@abhinavdangeti abhinavdangeti changed the title Merging kNN hits from a distributed index with exact search hits MB-59102: Merging kNN hits from a distributed index with exact search hits Dec 15, 2023
Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with this. Well done @CascadingRadium .
Keep an eye out for any optimizations we can fish out in this area for the future.

@abhinavdangeti abhinavdangeti merged commit 138fec5 into unstable Dec 15, 2023
9 checks passed
abhinavdangeti added a commit that referenced this pull request Jan 5, 2024
- Cleanup up unused code paths since the merging of
#1936 changed the overall
approach, resulting in a lot of dead code.
- Optimized the KNN collector to not update the allHits map for every
hit and instead just use its internal heaps when the Final() is called.
- Typo Fix
- Optimized the OptimizeVR's Finish() method to concurrently search
across segments.
 
Results of Optimization

Dataset -  SIFT 1 Million dataset 
Index - 1 partition index 

Before this commit - Query Execution time - 2.25 seconds
With this commit - Query Execution time - 1.70 seconds
Percentage Reduction - **24.4%**

---------

Co-authored-by: Abhinav Dangeti <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants