Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: auto index not working on text match search, results in Assertion error #38642

Open
1 task done
pycui opened this issue Dec 22, 2024 · 5 comments
Open
1 task done
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@pycui
Copy link

pycui commented Dec 22, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: 2.5.0-beta
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):    pulsar
- SDK version(e.g. pymilvus v2.0.0rc2): 2.5.0
- OS(Ubuntu or CentOS): Ubuntu
- CPU/Memory: 64/250T
- GPU: none
- Others:

Current Behavior

When trying to search text field using text match, returns a queryNode error

MilvusException: <MilvusException: (code=65535, message=fail to Query on QueryNode 1467: worker(1467) query failed: Operator::GetOutput failed for [Operator:PhyFilterBitsNode, plan node id: 180] : Assert "iter != text_indexes_.end()"  => failed to get text index, text index not found at /workspace/source/internal/core/src/segcore/SegmentInterface.cpp:399
)>

schema

fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=1024)
]
collection_schema = CollectionSchema(fields, description="A collection with text and embedding vector")

collection_schema.add_field(
    field_name='text', 
    datatype=DataType.VARCHAR, 
    max_length=1000, 
    enable_analyzer=True, # Whether to enable text analysis for this field
    enable_match=True # Whether to enable text match
)

relevant index

index_params.add_index(
    field_name="text",
    index_type="", 
    index_name="text_index"
)

search code

filter = "TEXT_MATCH(text, 'sample text 1')"

result = client.query(
    collection_name="text_embedding_collection",
    filter=filter, 
    output_fields=["id", "text"]
)

using embedding + text match hybrid also gets the same error

filter = "TEXT_MATCH(text, 'sample text 1')"
query_vector = np.random.random(1024).tolist()

result = client.search(
    collection_name="text_embedding_collection", 
    anns_field="embedding", 
    data=[query_vector], 
    filter=filter,
    search_params={"params": {"nprobe": 10}},
    limit=10, # Max. number of results to return
    output_fields=["id", "text"] # Fields to return
)

Expected Behavior

Returns retrieved rows

Steps To Reproduce

This can be reproduced by creating a new collection like above.

Milvus Log

No response

Anything else?

No response

@pycui pycui added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 22, 2024
@pycui
Copy link
Author

pycui commented Dec 22, 2024

I think the issue is we have to explicitly define index type, even though https://milvus.io/docs/scalar_index.md#Scalar-Index says we have Auto Indexing

@pycui pycui changed the title [Bug]: text match search results in Assertion error [Bug]: atuo index not working on text match search, results in Assertion error Dec 22, 2024
@pycui pycui changed the title [Bug]: atuo index not working on text match search, results in Assertion error [Bug]: auto index not working on text match search, results in Assertion error Dec 22, 2024
@yanliang567
Copy link
Contributor

/assign @zhengbuqian
/unassign

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 24, 2024
@SpadeA-Tang
Copy link
Contributor

Do you insert any values or what are your insert queries? I used the relevant version in standalone cluster and have not reproduced it. @pycui

@SpadeA-Tang
Copy link
Contributor

collection_schema.add_field(
field_name='text',
datatype=DataType.VARCHAR,
max_length=1000,
enable_analyzer=True, # Whether to enable text analysis for this field
enable_match=False # Whether to enable text match
)

When I set enable_match be false, I can reproduce the panic. So would you mind ensuring that you set this be True?
@pycui

@xiaofan-luan
Copy link
Collaborator

I think the issue is we have to explicitly define index type, even though https://milvus.io/docs/scalar_index.md#Scalar-Index says we have Auto Indexing

match seems to has nothing to do with index type? @pycui can you verify on 2.5.2 we released today?

@czs007 czs007 assigned SpadeA-Tang and unassigned zhengbuqian Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

5 participants