[Bug]: [json-inverted] Incorrect filter results are returned for queries on indexed json data #38879

ThreadDao · 2024-12-31T06:39:37Z

Is there an existing issue for this?

I have searched the existing issues

Environment

- Milvus version: JsDove-optimization_json-0b74598-20241230
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

test

create a collection with 3 fields: a pk int64 field + a vector field + a json field
create HNSW index for vector field
insert 10m entities, The data generation rules of the batch json column are as follows:

values = [{"id": i, "values": {"float": float(i), "varchar": str(i)}} for i in pks]
# example:
row_0: {"id": 0, "values": {"float": 0.0, "varchar": '0'}
row_1: {"id": 1, "values": {"float": 1.0, "varchar": '1'}

flush and create index again -> load
query with Strong consistency level: -> wrong query results

c.query('json_1["values"]["float"] < 20.0', limit=5, output_fields=["json_1"], consistency_level="Strong")
data: ["{'json_1': {'id': 8192, 'values': {'float': 8192.0, 'varchar': '8192'}}, 'id': 8192}", "{'json_1': {'id': 8193, 'values': {'float': 8193.0, 'varchar': '8193'}}, 'id': 8193}", "{'json_1': {'id': 8194, 'values': {'float': 8194.0, 'varchar': '8194'}}, 'id': 8194}", "{'json_1': {'id': 8195, 'values': {'float': 8195.0, 'varchar': '8195'}}, 'id': 8195}", "{'json_1': {'id': 8196, 'values': {'float': 8196.0, 'varchar': '8196'}}, 'id': 8196}"] 

c.query('json_1["values"]["float"] < 20', output_fields=["count(*)"])
data: ["{'count(*)': 565653}"]  #expected 20

You can refer to the instance master-20241225-c7313575-amd64 for the correct results of the same query with the same data.

c.query('json_1["values"]["float"] < 20', output_fields=["count(*)"])
data: ["{'count(*)': 20}"] , extra_info: {'cost': 0}
c.query('json_1["values"]["float"] < 20.0', limit=5, output_fields=["json_1"], consistency_level="Strong")
data: ["{'json_1': {'id': 0, 'values': {'float': 0.0, 'varchar': '0'}}, 'id': 0}", "{'json_1': {'id': 1, 'values': {'float': 1.0, 'varchar': '1'}}, 'id': 1}", "{'json_1': {'id': 2, 'values': {'float': 2.0, 'varchar': '2'}}, 'id': 2}", "{'json_1': {'id': 3, 'values': {'float': 3.0, 'varchar': '3'}}, 'id': 3}", "{'json_1': {'id': 4, 'values': {'float': 4.0, 'varchar': '4'}}, 'id': 4}"] , extra_info: {'cost': 0}

Expected Behavior

No response

Steps To Reproduce

argo workflow name: zong-json-index-6-10m
'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_concurrent_locust_custom_parameters',
            'test_case_params': {'dataset_params': {'metric_type': 'L2', 'dim': 128, 'dataset_name': 'sift', 'dataset_size': '10m', 'ni_per': 5000},
                                 'collection_params': {'other_fields': ['json_1'], 'shards_num': 1, 'collection_name': 'json_10m_coll'},
                                 'release_params': {'release_of_reload': True},
                                 'query_params': {},
                                 'search_params': {'output_fields': ['json_1'], 'timeout': 1200},
                                 'index_params': {'index_type': 'HNSW', 'index_param': {'M': 30, 'efConstruction': 200}},
                                 'concurrent_params': {'concurrent_number': 10, 'during_time': '10m', 'interval': 30, 'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'search',
                                                       'weight': 25,
                                                       'params': {'nq': 10,
                                                                  'top_k': 10,
                                                                  'output_fields': ['json_1'],
                                                                  'random_data': True,
                                                                  'search_param': {'ef': 96},
                                                                  'timeout': 60}}]},

Milvus Log

pods:

json-inverted-op-97-1369-milvus-datanode-55654cc544-wwf9d         1/1     Running                  0                20m     10.104.30.173   4am-node38   <none>           <none>
json-inverted-op-97-1369-milvus-indexnode-76d99f574b-b2np2        1/1     Running                  0                25m     10.104.23.12    4am-node27   <none>           <none>
json-inverted-op-97-1369-milvus-mixcoord-6677945b4d-9b66x         1/1     Running                  0                24m     10.104.30.169   4am-node38   <none>           <none>
json-inverted-op-97-1369-milvus-proxy-9d9558c86-r2t7t             1/1     Running                  0                20m     10.104.30.175   4am-node38   <none>           <none>
json-inverted-op-97-1369-milvus-querynode-0-6745868d58-5xthg      1/1     Running                  0                22m     10.104.21.149   4am-node24   <none>           <none>
json-inverted-op-97-1369-milvus-querynode-0-6745868d58-cwph7      1/1     Running                  0                23m     10.104.23.13    4am-node27   <none>           <none>

Anything else?

No response

The text was updated successfully, but these errors were encountered:

ThreadDao · 2024-12-31T08:58:20Z

@JsDove There are another strange problem. too many small segments? Need to check why compaction is not performed?

show segment --collection 454964804311692572
--- Growing: 0, Sealed: 0, Flushed: 39, Dropped: 0
--- Small Segments: 27, row count: 3320000	 Other Segments: 12, row count: 6680000
--- Total Segments: 39, row count: 10000000

ThreadDao added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 31, 2024

ThreadDao assigned yanliang567 and JsDove Dec 31, 2024

ThreadDao added the severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. label Dec 31, 2024

ThreadDao added this to the 2.5.2 milestone Dec 31, 2024

yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 3, 2025

yanliang567 removed their assignment Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: [json-inverted] Incorrect filter results are returned for queries on indexed json data #38879

[Bug]: [json-inverted] Incorrect filter results are returned for queries on indexed json data #38879

ThreadDao commented Dec 31, 2024

ThreadDao commented Dec 31, 2024

[Bug]: [json-inverted] Incorrect filter results are returned for queries on indexed json data #38879

[Bug]: [json-inverted] Incorrect filter results are returned for queries on indexed json data #38879

Comments

ThreadDao commented Dec 31, 2024

Is there an existing issue for this?

Environment

Current Behavior

test

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?

ThreadDao commented Dec 31, 2024