Multi ranks MLServer

requirement

pip install mlserver grpcio grpcio-health-checking grpcio-tools

Follow xft grpc server to start a grpc xft server with multi-ranks using scripts in grpc_launcher with mpirun.

Edit params in model-settings.json.

"token_path": "/data/llama-2-7b-chat-hf",
"xft_grpc_server_ip": "localhost",
"xft_grpc_server_port": "50051"

cd mlserver/multi-ranks
mlserver start .