-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
build: Upgrade to 24.07, TRT-LLM 0.11.0, and Triton CLI v0.0.10 (#81)
- Loading branch information
Showing
20 changed files
with
1,544 additions
and
543 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,8 +22,8 @@ and running the CLI from within the latest corresponding `tritonserver` | |
container image, which should have all necessary system dependencies installed. | ||
|
||
For vLLM and TRT-LLM, you can use their respective images: | ||
- `nvcr.io/nvidia/tritonserver:24.06-vllm-python-py3` | ||
- `nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3` | ||
- `nvcr.io/nvidia/tritonserver:24.07-vllm-python-py3` | ||
- `nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3` | ||
|
||
If you decide to run the CLI on the host or in a custom image, please | ||
see this list of [additional dependencies](#additional-dependencies-for-custom-environments) | ||
|
@@ -38,6 +38,7 @@ matrix below: | |
|
||
| Triton CLI Version | TRT-LLM Version | Triton Container Tag | | ||
|:------------------:|:---------------:|:--------------------:| | ||
| 0.0.10 | v0.11.0 | 24.07 | | ||
| 0.0.9 | v0.10.0 | 24.06 | | ||
| 0.0.8 | v0.9.0 | 24.05 | | ||
| 0.0.7 | v0.9.0 | 24.04 | | ||
|
@@ -56,7 +57,7 @@ It is also possible to install from a specific branch name, a commit hash | |
or a tag name. For example to install `triton_cli` with a specific tag: | ||
|
||
```bash | ||
GIT_REF="0.0.9" | ||
GIT_REF="0.0.10" | ||
pip install git+https://github.com/triton-inference-server/triton_cli.git@${GIT_REF} | ||
``` | ||
|
||
|
@@ -91,7 +92,7 @@ triton -h | |
triton import -m gpt2 | ||
|
||
# Start server pointing at the default model repository | ||
triton start --image nvcr.io/nvidia/tritonserver:24.06-vllm-python-py3 | ||
triton start --image nvcr.io/nvidia/tritonserver:24.07-vllm-python-py3 | ||
|
||
# Infer with CLI | ||
triton infer -m gpt2 --prompt "machine learning is" | ||
|
@@ -145,10 +146,10 @@ docker run -ti \ | |
--shm-size=1g --ulimit memlock=-1 \ | ||
-v ${HOME}/models:/root/models \ | ||
-v ${HOME}/.cache/huggingface:/root/.cache/huggingface \ | ||
nvcr.io/nvidia/tritonserver:24.06-vllm-python-py3 | ||
nvcr.io/nvidia/tritonserver:24.07-vllm-python-py3 | ||
|
||
# Install the Triton CLI | ||
pip install git+https://github.com/triton-inference-server/[email protected].9 | ||
pip install git+https://github.com/triton-inference-server/[email protected].10 | ||
|
||
# Authenticate with huggingface for restricted models like Llama-2 and Llama-3 | ||
huggingface-cli login | ||
|
@@ -214,10 +215,10 @@ docker run -ti \ | |
-v /tmp:/tmp \ | ||
-v ${HOME}/models:/root/models \ | ||
-v ${HOME}/.cache/huggingface:/root/.cache/huggingface \ | ||
nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3 | ||
nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3 | ||
|
||
# Install the Triton CLI | ||
pip install git+https://github.com/triton-inference-server/[email protected].9 | ||
pip install git+https://github.com/triton-inference-server/[email protected].10 | ||
|
||
# Authenticate with huggingface for restricted models like Llama-2 and Llama-3 | ||
huggingface-cli login | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,9 @@ | ||
# TRT-LLM image contains engine building and runtime dependencies | ||
FROM nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3 | ||
FROM nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3 | ||
|
||
# Setup vLLM Triton backend | ||
RUN mkdir -p /opt/tritonserver/backends/vllm && \ | ||
wget -P /opt/tritonserver/backends/vllm https://raw.githubusercontent.com/triton-inference-server/vllm_backend/r24.06/src/model.py | ||
wget -P /opt/tritonserver/backends/vllm https://raw.githubusercontent.com/triton-inference-server/vllm_backend/r24.07/src/model.py | ||
|
||
# vLLM runtime dependencies | ||
RUN pip install "vllm==0.4.3" | ||
RUN pip install "vllm==0.5.0.post1" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.