Ensemble/BLS models where individual steps are hosted on different machines (or clusters)? #7166
Replies: 1 comment 1 reply
-
Hi @vadimkantorov, thanks for reaching out! It is not natively supported using the ensembles/BLS today through standard Triton conventions only. You could theoretically do many things with BLS, such as using a Similarly, another example is that you can use Triton's in-process Python API to embed Triton within a RayServe deployment and use Ray to manage the multi-node logic: https://docs.ray.io/en/latest/serve/tutorials/triton-server-integration.html#start-the-triton-server-inside-a-ray-serve-application. Can you elaborate on your goal or use case? Is this for a max throughput scenario where you don't care as much about a minimum latency? Do you have any constraints around your use case on when you know you could afford the cost of communicating with another node round trip? CC @nnshah1 for viz |
Beta Was this translation helpful? Give feedback.
-
Is it supported by Ensemble models or BLS models the scheme where individual pipeline component / model is scheduled on a different machine?
E.g. it might make sense to host/scale certain preprocessing component / instance group on a big CPU machine (or even some cluster) and run another step on a GPU machine.
I guess for this during BLS/Ensemble config we'd need to specify URL endpoint for individual steps
Is sth like this supported natively by the Triton Server?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions