Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Thesaurus API and Synonym Index Handling in Search #268

Merged
merged 30 commits into from
Dec 19, 2024
Merged

Conversation

CascadingRadium
Copy link
Member

@CascadingRadium CascadingRadium commented Oct 10, 2024

  • Add Thesaurus API to find equivalent terms for a given term.
  • Enable Synonym Document objects with Synonym Field objects to provide Synonym Definitions.
    for creating the thesaurus in the search index.
  • Add a synonym section to handle synonym document processing; persist the synonym index in
    segments (separating it from the inverted and vector indexes), and manage the synonym index
    merging during segment merges.
  • Add command line tooling to access the thesaurus by parsing the segment file.
  • Update zap.md to reflect the index file format for thesaurus support.

@CascadingRadium CascadingRadium added the enhancement New feature or request label Oct 10, 2024
@CascadingRadium CascadingRadium self-assigned this Oct 10, 2024
@CascadingRadium CascadingRadium changed the title add a thesaurus datatype with its own section Add a thesaurus datatype with its own section Oct 16, 2024
@CascadingRadium CascadingRadium marked this pull request as ready for review October 16, 2024 09:52
@CascadingRadium CascadingRadium removed the request for review from moshaad7 November 5, 2024 12:33
CascadingRadium and others added 5 commits December 10, 2024 18:36
@CascadingRadium CascadingRadium changed the title Add a thesaurus datatype with its own section Add Thesaurus API and Synonym Index Handling in Search Dec 12, 2024
section_inverted_text_index.go Outdated Show resolved Hide resolved
cmd/zap/cmd/synonym.go Outdated Show resolved Hide resolved
section_inverted_text_index.go Outdated Show resolved Hide resolved
Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments.

section_synonym.go Outdated Show resolved Hide resolved
section_synonym.go Outdated Show resolved Hide resolved
segment.go Show resolved Hide resolved
segment.go Outdated Show resolved Hide resolved
fix
Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly ok to me. One comment around naming/commentary.

section_synonym_index.go Show resolved Hide resolved
section_synonym_index.go Show resolved Hide resolved
section_synonym_index.go Show resolved Hide resolved
section_synonym_index.go Outdated Show resolved Hide resolved
segment.go Outdated Show resolved Hide resolved
segment.go Outdated Show resolved Hide resolved
Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CascadingRadium . Let's get this in for now and incrementally improve the area as and when needed.

@abhinavdangeti abhinavdangeti merged commit 82553cd into master Dec 19, 2024
6 checks passed
@abhinavdangeti abhinavdangeti deleted the synonyms branch December 19, 2024 16:04
abhinavdangeti added a commit to blevesearch/bleve that referenced this pull request Dec 19, 2024
- Allow setting up `synonym_sources` in the index mapping, which will
follow its own ingest pipeline, ingesting special synonym definitions
using the IndexSynonym API().
- A `synonym_source` can be set like an analyzer to a field mapping and
can be set as a default option at the document mapping or the index
mapping level.
- Each `synonym_source` can have its own analyzer, making it flexible to
allow for compatibility with the language analyzer specified for its
corresponding mapping.
- Compatibility with every term-based query where the term gets expanded
to include its synonyms at query time.
- Dependencies:
- blevesearch/[email protected] -
blevesearch/bleve_index_api#57
- blevesearch/[email protected] -
blevesearch/scorch_segment_api#46
- blevesearch/[email protected] -
blevesearch/vellum#22
- blevesearch/zapx@v16@latest -
blevesearch/zapx#268

---------

Co-authored-by: Abhinav Dangeti <[email protected]>
@CascadingRadium
Copy link
Member Author

Thanks for merging @abhinavdangeti

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants