Skip to content
This repository has been archived by the owner on Mar 30, 2024. It is now read-only.

Commit

Permalink
Upgrade ctransformer to 0.2.22, add GPUT support for StarCoder, make … (
Browse files Browse the repository at this point in the history
#51)

Signed-off-by: Hung-Han (Henry) Chen <[email protected]>
  • Loading branch information
chenhunghan authored Aug 13, 2023
1 parent 7868f37 commit 06da3db
Show file tree
Hide file tree
Showing 7 changed files with 49 additions and 9 deletions.
16 changes: 13 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,13 +86,23 @@ To enable GPU/CUDA acceleration, you need to use the container image built for G
- `deployment.image` = `ghcr.io/chenhunghan/ialacol-cuda12:latest`
- `deployment.env.GPU_LAYERS` is the layer to off loading to GPU.

For example
Only `llama`, `falcon`, `mpt` and `gpt_bigcode`(StarCoder/StarChat) support CUDA.

#### Llama with CUDA12

```sh
helm install llama2-7b-chat-cuda12 ialacol/ialacol -f examples/values/llama2-7b-chat-cuda12.yaml
```

Deploys llama2 7b model with 40 layers offloadind to GPU. The inference is accelerated by CUDA 12.

#### StarCoderPlus with CUDA12

```sh
helm install llama2-7b-chat-cuda11 ialacol/ialacol -f examples/values/llama2-7b-chat-cuda11.yaml
helm install starcoderplus-guanaco-cuda12 ialacol/ialacol -f examples/values/starcoderplus-guanaco-cuda12.yaml
```

Deploys llama2 7b model with 40 layers offloadind to GPU. The inference is accelerated by CUDA 11.
Deploys [Starcoderplus-Guanaco-GPT4-15B-V1.0 model](https://huggingface.co/LoupGarou/Starcoderplus-Guanaco-GPT4-15B-V1.0) with 40 layers offloadind to GPU. The inference is accelerated by CUDA 12.

### CUDA Driver Issues

Expand Down
4 changes: 2 additions & 2 deletions charts/ialacol/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
apiVersion: v2
appVersion: 0.9.0
appVersion: 0.10.0
description: A Helm chart for ialacol
name: ialacol
type: application
version: 0.9.0
version: 0.10.0
2 changes: 1 addition & 1 deletion charts/ialacol/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ replicas: 1

deployment:
image: quay.io/chenhunghan/ialacol:latest
# or use CUDA11 image `ghcr.io/chenhunghan/ialacol-cuda11:latest`
# or use CUDA image `ghcr.io/chenhunghan/ialacol-cuda12:latest`
# env:
# DEFAULT_MODEL_HG_REPO_ID: TheBloke/Llama-2-7B-Chat-GGML
# DEFAULT_MODEL_FILE: llama-2-7b-chat.ggmlv3.q4_0.bin
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
replicas: 1
deployment:
image: ghcr.io/chenhunghan/ialacol-cuda11:latest
image: ghcr.io/chenhunghan/ialacol-cuda12:latest
env:
DEFAULT_MODEL_HG_REPO_ID: TheBloke/Llama-2-7B-Chat-GGML
DEFAULT_MODEL_FILE: llama-2-7b-chat.ggmlv3.q4_0.bin
Expand Down
30 changes: 30 additions & 0 deletions examples/values/starcoderplus-guanaco-cuda12.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
replicas: 1
deployment:
image: quay.io/chenhunghan/ialacol-cuda12:latest
env:
DEFAULT_MODEL_HG_REPO_ID: TheBloke/Starcoderplus-Guanaco-GPT4-15B-V1.0-GGML
DEFAULT_MODEL_FILE: starcoderplus-guanaco-gpt4.ggmlv1.q4_0.bin
GPU_LAYERS: 40
resources:
{}
cache:
persistence:
size: 20Gi
accessModes:
- ReadWriteOnce
storageClassName: ~
cacheMountPath: /app/cache
model:
persistence:
size: 20Gi
accessModes:
- ReadWriteOnce
storageClassName: ~
modelMountPath: /app/models
service:
type: ClusterIP
port: 8000
annotations: {}
nodeSelector: {}
tolerations: []
affinity: {}
2 changes: 1 addition & 1 deletion get_llm.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ async def get_llm(
or "WizardCoder" in body.model
or "minotaur-15" in body.model
):
ctransformer_model_type = "starcoder"
ctransformer_model_type = "gpt_bigcode"
if "llama" in body.model:
ctransformer_model_type = "llama"
if "mpt" in body.model:
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ blake3==0.3.3
certifi==2023.7.22
charset-normalizer==3.1.0
click==8.1.3
ctransformers==0.2.21
ctransformers==0.2.22
fastapi==0.95.2
filelock==3.12.0
fsspec==2023.5.0
Expand Down

0 comments on commit 06da3db

Please sign in to comment.