pip install mlserver grpcio grpcio-health-checking grpcio-tools
Follow xft grpc server to start a grpc xft server with multi-ranks using scripts in grpc_launcher
with mpirun
.
Edit params in model-settings.json
.
"token_path": "/data/llama-2-7b-chat-hf",
"xft_grpc_server_ip": "localhost",
"xft_grpc_server_port": "50051"
cd mlserver/multi-ranks
mlserver start .