Skip to content

Commit

Permalink
build: Update CLI version references to 0.0.8 and Triton references t…
Browse files Browse the repository at this point in the history
…o 24.05 (#72)
  • Loading branch information
rmccorm4 authored Jun 11, 2024
1 parent db70fcb commit 8f577d3
Show file tree
Hide file tree
Showing 5 changed files with 17 additions and 28 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/python-package.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ jobs:
build:
runs-on: ${{ matrix.os }}
container:
image: nvcr.io/nvidia/tritonserver:24.04-py3
image: nvcr.io/nvidia/tritonserver:24.05-py3
strategy:
fail-fast: false
matrix:
Expand Down
21 changes: 10 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ and running the CLI from within the latest corresponding `tritonserver`
container image, which should have all necessary system dependencies installed.

For vLLM and TRT-LLM, you can use their respective images:
- `nvcr.io/nvidia/tritonserver:24.04-vllm-python-py3`
- `nvcr.io/nvidia/tritonserver:24.04-trtllm-python-py3`
- `nvcr.io/nvidia/tritonserver:24.05-vllm-python-py3`
- `nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3`

If you decide to run the CLI on the host or in a custom image, please
see this list of [additional dependencies](#additional-dependencies-for-custom-environments)
Expand All @@ -38,6 +38,7 @@ matrix below:

| Triton CLI Version | TRT-LLM Version | Triton Container Tag |
|:------------------:|:---------------:|:--------------------:|
| 0.0.8 | v0.9.0 | 24.05 |
| 0.0.7 | v0.9.0 | 24.04 |
| 0.0.6 | v0.8.0 | 24.02, 24.03 |
| 0.0.5 | v0.7.1 | 24.01 |
Expand All @@ -51,10 +52,10 @@ pip install git+https://github.com/triton-inference-server/triton_cli.git
```

It is also possible to install from a specific branch name, a commit hash
or a tag name. For example to install `triton_cli` with tag 0.0.7:
or a tag name. For example to install `triton_cli` with a specific tag:

```bash
GIT_REF="0.0.7"
GIT_REF="0.0.8"
pip install git+https://github.com/triton-inference-server/triton_cli.git@${GIT_REF}
```

Expand Down Expand Up @@ -89,7 +90,7 @@ triton -h
triton import -m gpt2

# Start server pointing at the default model repository
triton start --image nvcr.io/nvidia/tritonserver:24.04-vllm-python-py3
triton start --image nvcr.io/nvidia/tritonserver:24.05-vllm-python-py3

# Infer with CLI
triton infer -m gpt2 --prompt "machine learning is"
Expand Down Expand Up @@ -143,11 +144,10 @@ docker run -ti \
--shm-size=1g --ulimit memlock=-1 \
-v ${HOME}/models:/root/models \
-v ${HOME}/.cache/huggingface:/root/.cache/huggingface \
nvcr.io/nvidia/tritonserver:24.04-vllm-python-py3
nvcr.io/nvidia/tritonserver:24.05-vllm-python-py3

# Install the Triton CLI
GIT_REF="0.0.7"
pip install git+https://github.com/triton-inference-server/triton_cli.git@${GIT_REF}
pip install git+https://github.com/triton-inference-server/[email protected]

# Authenticate with huggingface for restricted models like Llama-2 and Llama-3
huggingface-cli login
Expand Down Expand Up @@ -213,11 +213,10 @@ docker run -ti \
-v /tmp:/tmp \
-v ${HOME}/models:/root/models \
-v ${HOME}/.cache/huggingface:/root/.cache/huggingface \
nvcr.io/nvidia/tritonserver:24.04-trtllm-python-py3
nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3

# Install the Triton CLI
GIT_REF="0.0.7"
pip install git+https://github.com/triton-inference-server/triton_cli.git@${GIT_REF}
pip install git+https://github.com/triton-inference-server/[email protected]

# Authenticate with huggingface for restricted models like Llama-2 and Llama-3
huggingface-cli login
Expand Down
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,8 @@ dependencies = [
"psutil >= 5.9.5", # may remove later
"rich == 13.5.2",
# TODO: Test on cpu-only machine if [cuda] dependency is an issue,
"tritonclient[all] >= 2.46",
# Use explicit client version matching genai-perf version for tagged release
"tritonclient[all] == 2.46",
"huggingface-hub >= 0.19.4",
# Testing
"pytest >= 8.1.1", # may remove later
Expand Down
2 changes: 1 addition & 1 deletion src/triton_cli/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,4 @@
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

__version__ = "0.0.8dev"
__version__ = "0.0.8"
17 changes: 3 additions & 14 deletions src/triton_cli/docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,20 +1,9 @@
FROM nvcr.io/nvidia/tritonserver:24.04-trtllm-python-py3
# TRT-LLM image contains engine building and runtime dependencies
FROM nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3

# Setup vLLM Triton backend
RUN mkdir -p /opt/tritonserver/backends/vllm && \
wget -P /opt/tritonserver/backends/vllm https://raw.githubusercontent.com/triton-inference-server/vllm_backend/main/src/model.py

# TRT-LLM engine build dependencies
# NOTE: torch 2.2.0 has a symbol conflict, so WAR is to install 2.1.2
RUN pip install \
"psutil" \
"pynvml>=11.5.0" \
--extra-index-url https://pypi.nvidia.com/ "tensorrt-llm==0.9.0"

# vLLM runtime dependencies
RUN pip install \
# Triton 24.04 vLLM containers comes with "vllm==0.4.0.post1", but this has
# incompatible dependencies with trtllm==0.9.0 around torch and transformers.
"vllm==0.4.1"

# TODO: Install Triton CLI in this image
RUN pip install "vllm==0.4.3"

0 comments on commit 8f577d3

Please sign in to comment.