what is the strategy of triton for running models in parallel, multi-thread or multi-process? #6253

heivens · 2023-08-31T06:41:05Z

heivens
Aug 31, 2023

what is the strategy of triton for running models in parallel, multi-thread or multi-process?

Sep 1, 2023

This differs based on the backend and model configuration. For example, Python backend runs models in their own processes. TensorRT uses CUDA streams. It also depends on your model configuration (e.g. if you specify multiple model instances, they will be on the same device, so many backends use multi-threading to enable parallel inference).

View full answer

dyastremsky · 2023-09-01T17:35:11Z

dyastremsky
Sep 1, 2023
Collaborator

This differs based on the backend and model configuration. For example, Python backend runs models in their own processes. TensorRT uses CUDA streams. It also depends on your model configuration (e.g. if you specify multiple model instances, they will be on the same device, so many backends use multi-threading to enable parallel inference).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what is the strategy of triton for running models in parallel, multi-thread or multi-process? #6253

{{title}}

Replies: 1 comment

{{title}}

Select a reply

what is the strategy of triton for running models in parallel, multi-thread or multi-process? #6253

heivens Aug 31, 2023

Replies: 1 comment

dyastremsky Sep 1, 2023 Collaborator

heivens
Aug 31, 2023

dyastremsky
Sep 1, 2023
Collaborator