Skip to content
This repository has been archived by the owner on Mar 30, 2024. It is now read-only.

Commit

Permalink
Add experimental support for GPTQ models (#50)
Browse files Browse the repository at this point in the history
Signed-off-by: Hung-Han (Henry) Chen <[email protected]>
  • Loading branch information
chenhunghan authored Aug 9, 2023
1 parent c1fa9ba commit 28d515d
Show file tree
Hide file tree
Showing 5 changed files with 107 additions and 2 deletions.
46 changes: 46 additions & 0 deletions .github/workflows/gptq_image.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
name: Build and Push GPTQ Image to Github Container Registry

on:
push:
branches:
- main
paths:
- '**.py'
- 'requirements.txt'
- 'Dockerfile.gptq'
- '.github/workflows/gptq_image.yaml'

env:
REGISTRY: ghcr.io
GPTQ_IMAGE_NAME: ialacol-gptq
jobs:
gptq_image_to_gcr:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 0
- uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.GPTQ_IMAGE_NAME }}
- name: Build and push Docker image
uses: docker/build-push-action@v4
with:
context: .
file: ./Dockerfile.gptq
push: true
tags: |
${{ env.REGISTRY }}/${{ github.repository_owner }}/${{ env.GPTQ_IMAGE_NAME }}:${{ github.sha }}
${{ env.REGISTRY }}/${{ github.repository_owner }}/${{ env.GPTQ_IMAGE_NAME }}:latest
labels: ${{ steps.meta.outputs.labels }}
11 changes: 11 additions & 0 deletions Dockerfile.gptq
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# syntax=docker/dockerfile:1

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt
# https://github.com/marella/ctransformers#gptq
RUN pip3 install ctransformers[gptq]
COPY . .
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,24 @@ For example
helm install llama2-7b-chat-metal ialacol/ialacol -f examples/values/llama2-7b-chat-metal.yaml.yaml
```

### GPTQ

To use GPTQ, you must

- `deployment.image` = `ghcr.io/chenhunghan/ialacol-gptq:latest`
- `deployment.env.MODEL_TYPE` = `gptq`

For example

```sh
helm install llama2-7b-chat-gptq ialacol/ialacol -f examples/values/llama2-7b-chat-gptq.yaml.yaml
```

```sh
kubectl port-forward svc/llama2-7b-chat-gptq 8000:8000
openai -k "sk-fake" -b http://localhost:8000/v1 -vvvvv api chat_completions.create -m gptq_model-4bit-128g.safetensors -g user "Hello world!"
```

## Tips

### Creative v.s. Conservative
Expand Down
4 changes: 2 additions & 2 deletions charts/ialacol/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
apiVersion: v2
appVersion: 0.8.0
appVersion: 0.9.0
description: A Helm chart for ialacol
name: ialacol
type: application
version: 0.8.0
version: 0.9.0
30 changes: 30 additions & 0 deletions examples/values/llama2-7b-chat-gptq.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
replicas: 1
deployment:
image: ghcr.io/chenhunghan/ialacol-gptq:latest
env:
DEFAULT_MODEL_HG_REPO_ID: TheBloke/Llama-2-7b-Chat-GPTQ
DEFAULT_MODEL_FILE: gptq_model-4bit-128g.safetensors
MODEL_TYPE: "gptq"
resources:
{}
cache:
persistence:
size: 5Gi
accessModes:
- ReadWriteOnce
storageClassName: ~
cacheMountPath: /app/cache
model:
persistence:
size: 5Gi
accessModes:
- ReadWriteOnce
storageClassName: ~
modelMountPath: /app/models
service:
type: ClusterIP
port: 8000
annotations: {}
nodeSelector: {}
tolerations: []
affinity: {}

0 comments on commit 28d515d

Please sign in to comment.