Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: can't build index GPU_CAGRA #38650

Open
1 task done
Dong148 opened this issue Dec 23, 2024 · 6 comments
Open
1 task done

[Bug]: can't build index GPU_CAGRA #38650

Dong148 opened this issue Dec 23, 2024 · 6 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@Dong148
Copy link

Dong148 commented Dec 23, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: 2.5.0-beta gpu
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka):    default
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus 2.5.0
- OS(Ubuntu or CentOS): CentOS Stream 9
- CPU/Memory:  Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz
- GPU: RTX A4500*4
- Others:

Current Behavior

when creating GPU_CAGRA index for 4096 dims float field, py sdk keep blocking, and milvus log shows such failed:
failed to build index, raft inner error ...... VectorMemIndex.cpp:276: segcore error[segcoreCode=2004]

Expected Behavior

GPU_IVF_FLAT works fine

Steps To Reproduce

No response

Milvus Log

[2024/12/23 03:36:06.395 +00:00] [INFO] [indexnode/task_index.go:307] ["debug create index"] [clusterID=by-dev] [buildID=454799776879359244] [collection=454799776879358794] [segmentID=454799776879358832] [currentIndexVersion=6] [buildIndexParams="clusterID:"by-dev" buildID:454799776879359244 collectionID:454799776879358794 partitionID:454799776879358795 segmentID:454799776879358832 index_version:136 current_index_version:6 num_rows:1024 dim:4096 index_file_prefix:"files/index_files" insert_files:"files/insert_log/454799776879358794/454799776879358795/454799776879358832/101/454799776879358836" field_schema:{fieldID:101 name:"vector" data_type:FloatVector type_params:{key:"dim" value:"4096"}} storage_config:{address:"minio:9000" access_keyID:"minioadmin" secret_access_key:"minioadmin" bucket_name:"a-bucket" root_path:"files" storage_type:"remote" cloud_provider:"aws" request_timeout_ms:10000 sslCACert:"/path/to/public.crt"} index_params:{key:"dim" value:"4096"} index_params:{key:"cache_dataset_on_device" value:"true"} index_params:{key:"index_type" value:"GPU_CAGRA"} index_params:{key:"build_dram_budget_gb" value:"124.059021"} index_params:{key:"num_build_thread" value:"80"} index_params:{key:"metric_type" value:"IP"} index_params:{key:"intermediate_graph_degree" value:"64"} index_params:{key:"graph_degree" value:"32"} index_params:{key:"vec_field_size_gb" value:"0.000000"} type_params:{key:"dim" value:"4096"}"]
[2024/12/23 03:36:06.396 +00:00] [INFO] [datacoord/task_index.go:316] ["query task index info successfully"] [taskID=454799776879359244] ["result state"=InProgress] [failReason=]
[2024/12/23 03:36:06.493 +00:00] [WARN] [indexcgowrapper/helper.go:71] ["failed to create index, C Runtime Exception: => failed to build index, raft inner error at /workspace/source/internal/core/src/index/VectorMemIndex.cpp:276\n\n"]
[2024/12/23 03:36:06.494 +00:00] [WARN] [indexnode/task_index.go:314] ["failed to build index"] [clusterID=by-dev] [buildID=454799776879359244] [collection=454799776879358794] [segmentID=454799776879358832] [currentIndexVersion=6] [error="failed to create index, C Runtime Exception: => failed to build index, raft inner error at /workspace/source/internal/core/src/index/VectorMemIndex.cpp:276\n\n: segcore error[segcoreCode=2004]"] [errorVerbose="failed to create index, C Runtime Exception: => failed to build index, raft inner error at /workspace/source/internal/core/src/index/VectorMemIndex.cpp:276: segcore error[segcoreCode=2004]\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/pkg/util/merr.WrapErrSegcore\n | \t/workspace/source/pkg/util/merr/utils.go:1006\n | github.com/milvus-io/milvus/internal/util/indexcgowrapper.HandleCStatus\n | \t/workspace/source/internal/util/indexcgowrapper/helper.go:78\n | github.com/milvus-io/milvus/internal/util/indexcgowrapper.CreateIndex\n | \t/workspace/source/internal/util/indexcgowrapper/index.go:111\n | github.com/milvus-io/milvus/internal/indexnode.(*indexBuildTask).Execute\n | \t/workspace/source/internal/indexnode/task_index.go:309\n | github.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).processTask.func1\n | \t/workspace/source/internal/indexnode/task_scheduler.go:222\n | github.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).processTask\n | \t/workspace/source/internal/indexnode/task_scheduler.go:235\n | github.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).indexBuildLoop.func1\n | \t/workspace/source/internal/indexnode/task_scheduler.go:262\n | runtime.goexit\n | \t/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695\nWraps: (2) failed to create index, C Runtime Exception: => failed to build index, raft inner error at /workspace/source/internal/core/src/index/VectorMemIndex.cpp:276\nWraps: (3) segcore error[segcoreCode=2004]\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) merr.milvusError"]
[2024/12/23 03:36:06.494 +00:00] [WARN] [indexnode/task_scheduler.go:236] ["process task failed"] [error="failed to create index, C Runtime Exception: => failed to build index, raft inner error at /workspace/source/internal/core/src/index/VectorMemIndex.cpp:276\n\n: segcore error[segcoreCode=2004]"] [errorVerbose="failed to create index, C Runtime Exception: => failed to build index, raft inner error at /workspace/source/internal/core/src/index/VectorMemIndex.cpp:276: segcore error[segcoreCode=2004]\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/pkg/util/merr.WrapErrSegcore\n | \t/workspace/source/pkg/util/merr/utils.go:1006\n | github.com/milvus-io/milvus/internal/util/indexcgowrapper.HandleCStatus\n | \t/workspace/source/internal/util/indexcgowrapper/helper.go:78\n | github.com/milvus-io/milvus/internal/util/indexcgowrapper.CreateIndex\n | \t/workspace/source/internal/util/indexcgowrapper/index.go:111\n | github.com/milvus-io/milvus/internal/indexnode.(*indexBuildTask).Execute\n | \t/workspace/source/internal/indexnode/task_index.go:309\n | github.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).processTask.func1\n | \t/workspace/source/internal/indexnode/task_scheduler.go:222\n | github.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).processTask\n | \t/workspace/source/internal/indexnode/task_scheduler.go:235\n | github.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).indexBuildLoop.func1\n | \t/workspace/source/internal/indexnode/task_scheduler.go:262\n | runtime.goexit\n | \t/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695\nWraps: (2) failed to create index, C Runtime Exception: => failed to build index, raft inner error at /workspace/source/internal/core/src/index/VectorMemIndex.cpp:276\nWraps: (3) segcore error[segcoreCode=2004]\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) merr.milvusError"]

Anything else?

{
"index_type": "GPU_CAGRA",
"metric_type": "IP",
"params": {
'intermediate_graph_degree': 64,
'graph_degree': 32,
"cache_dataset_on_device": "true"
}
#

}
Or
{
    "index_type": "GPU_CAGRA",
    "metric_type": "IP",
}

both the same problems

@Dong148 Dong148 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 23, 2024
@Dong148
Copy link
Author

Dong148 commented Dec 23, 2024

GPU_CAGRA with L2 metric works fine either

@Dong148
Copy link
Author

Dong148 commented Dec 23, 2024

release 2.5.0 failed too

@yanliang567
Copy link
Contributor

/assign @Presburger
/unassign

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 23, 2024
@yanliang567 yanliang567 modified the milestones: 2.5.0, 2.5.1 Dec 23, 2024
@Presburger
Copy link
Member

@Dong148 hi, CAGRA supports IP. Could you tell me the total amount of data? You can set cache_dataset_on_device to false, which will increase the memory usage.

@Dong148
Copy link
Author

Dong148 commented Dec 30, 2024

@Dong148 hi, CAGRA supports IP. Could you tell me the total amount of data? You can set cache_dataset_on_device to false, which will increase the memory usage.

1M entries with 4096 dims vector only;

    "index_type": "GPU_CAGRA",
    "metric_type": "IP",
}```
i've tried default settings 

@yanliang567 yanliang567 modified the milestones: 2.5.1, 2.5.2 Dec 30, 2024
@Presburger
Copy link
Member

@Dong148 Have there been any changes to the GPU configuration corresponding to milvus.yml? For 1 million data points with 4096 dimensions, your 4 GPUs with approximately 80GB of RAM should generally be sufficient to handle it.

@yanliang567 yanliang567 modified the milestones: 2.5.2, 2.5.3 Jan 6, 2025
@yanliang567 yanliang567 modified the milestones: 2.5.3, 2.5.4 Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants