[Bug]: TEXT_MATCH returning no results sometimes. #38644

danielelongo14 · 2024-12-22T16:27:36Z

Is there an existing issue for this?

I have searched the existing issues

Environment

- Milvus version: 2.5.0-gpu
- Deployment mode(standalone or cluster): Standalone
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus 2.5.0
- OS(Ubuntu or CentOS): MacOS and linux
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

After uploading the documents in milvus, I query with client.search and filter = f"TEXT_MATCH(content, '{important_words}')".
The first query goes well, then even trying with the same query it returns no results.

Expected Behavior

No response

Steps To Reproduce

[UPDATED]

Start Milvus on Docker

Create a collection with

        COLLECTION_NAME = "Documents"
        EMBEDDING_DIM = 768
        analyzer_params = {
                "type": "standard",
                "filter": ["lowercase"],
                }

        fields = [
            FieldSchema(name="id", dtype=DataType.INT64, max_length=100, is_primary=True),
            FieldSchema(name="content", dtype=DataType.VARCHAR, max_length=10000, enable_analyzer=True, analyzer_params=analyzer_params, enable_match=True),
            FieldSchema(name="file_id", dtype=DataType.VARCHAR, max_length=100),
            FieldSchema(name="file_name", dtype=DataType.VARCHAR, max_length=255),
            FieldSchema(name="page_num", dtype=DataType.INT64),
            FieldSchema(name="para_num", dtype=DataType.INT64),
            FieldSchema(name="data_type", dtype=DataType.VARCHAR, max_length=50),
            FieldSchema(name="company_name", dtype=DataType.VARCHAR, max_length=255),
            FieldSchema(name="file_path", dtype=DataType.VARCHAR, max_length=100),
            FieldSchema(name="folder_name", dtype=DataType.VARCHAR, max_length=255),
            FieldSchema(name="created_date", dtype=DataType.VARCHAR, max_length=50),
            FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=EMBEDDING_DIM),
            FieldSchema(name="sparse_embeddings", dtype=DataType.SPARSE_FLOAT_VECTOR)
        ]

        try:
            schema = CollectionSchema(fields, description="Document collection", auto_id=True, enable_dynamic_field=True)
            collection = Collection(COLLECTION_NAME, schema)
            # Create an index on the dense embedding field

            try:
                dense_index_params = {
                    "index_type": "IVF_FLAT",  # Choose an appropriate index type (e.g., IVF_FLAT)
                    "metric_type": "COSINE",   # Metric type (e.g., COSINE for cosine similarity)
                    "params": {"nlist": 128}   # Index parameters
                }

                collection.create_index(field_name="embedding", index_params=dense_index_params)
            except Exception as e:
                print(e)

            try:
                # Create an index on the sparse embedding field
                sparse_index_params = {
                    "index_type": "SPARSE_INVERTED_INDEX",
                    "metric_type": "IP",  # Inner Product is supported for sparse vectors
                    "params": {"drop_ratio_build": 0.2}  # Parameters specific to sparse indexing
                }

                collection.create_index(field_name="sparse_embeddings", index_params=sparse_index_params)

                logger.info(f"Indexes created successfully")

            except Exception as e:
                logger.error(f"Error while creating Indexes: {e}")
        except Exception as e:
            logger.error(f"Error while creating Schema: {e}")

ONLY CREATE COLLECTION IF the password is not updated, so one time only.

Load documents inside the collection, and try the TEXT_MATCH filter. At this stage, it works normally.

Remove the container and rebuild it. Now, the collection and the documents are still in the volume, so it doesn't create a new one.

Try the TEXT_MATCH filter, it doesn't find any results, while the normal semantic search does.

The only solution I've found is to delete all the volumes, recreate the collection, and reinsert the documents.

Milvus Log

No response

Anything else?

No response

The text was updated successfully, but these errors were encountered:

yanliang567 · 2024-12-24T08:11:46Z

there is a limitation that Text_Match does not search in the growing segment immediately, until it was synced to disk. I guess you hint this issue, so please try to manually flush() the collection after insertion. BTW, it is not recommended to flush manually, but in this case you can try that.

/assign @danielelongo14
/unassign

xiaofan-luan · 2024-12-26T13:50:29Z

TEXT_MATCH

Guess this is not the case.
Text Match is guarantee to be searched in 200ms, if not it's a issue

xiaofan-luan · 2024-12-26T13:52:00Z

Remove the container and rebuild it. Now, the collection and the documents are still in the volume, so it doesn't create a new one.

did you mean stop the container and brought up a another one?

@yanliang567 my guess is match is not done in one collection

danielelongo14 · 2024-12-26T16:00:30Z

Yes, I usually do

docker compose up --build --no-cache and then docker compose down

Without deleting the volumes, do again docker compose up --build --no-cache

Now text match won't work, while semantic search will

yiwen92 · 2025-01-08T05:42:22Z

Thanks for you report @danielelongo14 . Our dev team just found an internal bug that might cause this issue. Now it's under fixing, will let you know which version should be fixed.

danielelongo14 · 2025-01-08T08:23:16Z

Thank you!

Sometimes also I get a LoadSegment: Error in GetObjectSize, [errcode:404, exception:, errmessage:No response body.,
Do you guys happen to know what can cause it while loading the collection?

danielelongo14 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 22, 2024

danielelongo14 assigned yanliang567 Dec 22, 2024

sre-ci-robot assigned danielelongo14 and unassigned yanliang567 Dec 24, 2024

yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 24, 2024

SpadeA-Tang linked a pull request Jan 8, 2025 that will close this issue

fix: build text index when loading field data #39070

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: TEXT_MATCH returning no results sometimes. #38644

[Bug]: TEXT_MATCH returning no results sometimes. #38644

danielelongo14 commented Dec 22, 2024 •

edited

Loading

yanliang567 commented Dec 24, 2024 •

edited

Loading

xiaofan-luan commented Dec 26, 2024

xiaofan-luan commented Dec 26, 2024

danielelongo14 commented Dec 26, 2024

yiwen92 commented Jan 8, 2025

danielelongo14 commented Jan 8, 2025

[Bug]: TEXT_MATCH returning no results sometimes. #38644

[Bug]: TEXT_MATCH returning no results sometimes. #38644

Comments

danielelongo14 commented Dec 22, 2024 • edited Loading

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?

yanliang567 commented Dec 24, 2024 • edited Loading

xiaofan-luan commented Dec 26, 2024

xiaofan-luan commented Dec 26, 2024

danielelongo14 commented Dec 26, 2024

yiwen92 commented Jan 8, 2025

danielelongo14 commented Jan 8, 2025

danielelongo14 commented Dec 22, 2024 •

edited

Loading

yanliang567 commented Dec 24, 2024 •

edited

Loading