Add vLLM+HPA support to ChatQnA Helm chart #610

eero-t · 2024-11-25T18:04:21Z

Description

Add vLLM + HPA support for ChatQnA Helm charts.

Similarly to how it's already done in Agent component, there's tgi.enabled and vllm.enabled flags for selecting which LLM will be used.

Note: ChatQnA does not yet support using vLLM for embedding & reranking [1], so HF TEI is still used for those.

[1] opea-project/GenAIExamples#1237

Issues

Fixes #608 partially.
Fixes #631.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds new functionality)

New dependencies

n/a

Tests

Manual testing on top of main HEAD / v1.1 images.

eero-t · 2024-11-25T18:09:28Z

Setting as draft. I have tested that DocSum with Gaudi vLLM works, and that ChatQnA Helm chart can be installed, but due to v1.1 image pulls currently taking so long in my test node, I haven't been able test ChatQnA with Gaudi vLLM properly yet.

vLLM CPU version testing would also be needed before merging this (I'm hoping somebody else here could check at least DocSum with CPU vLLM).

eero-t · 2024-11-26T10:39:02Z

CI issues:

LLM-uservice: openai.NotFoundError: Error code: 404 - {'object': 'error', 'message': 'The model meta-llama/Meta-Llama-3-8B-Instruct does not exist.', 'type': 'NotFoundError', 'param': None, 'code': 404}
- Bug already in the CI/repo, pre-existing Helm chart refers to model not present in CI => I can fix it
DocSum: 100.83.111.229:5000/opea/llm-docsum-vllm:latest: not found
- Bug in OPEA image creation, that image is missing from DockerHub & CI => somebody else needs to fix that

eero-t · 2024-11-26T12:35:12Z

This overlaps partly with #403.

eero-t · 2024-11-29T20:08:52Z

Added HPA support for ChatQnA / vLLM.

eero-t · 2024-12-02T18:25:44Z

This bug for llm-uservice, gaudi, ci-vllm-gaudi-values, common is weird:

openai.NotFoundError: Error code: 404 - {'object': 'error', 'message': 'The model `meta-llama/Meta-Llama-3-8B-Instruct` does not exist.', 'type': 'NotFoundError', 'param': None, 'code': 404}

Because neither that CI file, nor values.yaml specifies that model for llm-uservice:

$ git  grep -l '^[^#]*Meta-Llama-3-8B-Instruct'
helm-charts/common/llm-uservice/ci-faqgen-values.yaml
helm-charts/common/llm-uservice/variant_faqgen-values.yaml
helm-charts/faqgen/README.md
helm-charts/faqgen/values.yaml

So CI either invents that model from thin air, or is using wrong CI file (faqgen instead of vllm-gaudi one)!

eero-t · 2024-12-10T18:47:38Z

Changes:

Cherry-picked changes from Adapt to latest vllm changes #632
Dropped vLLM support for DocSum as required image is missing
Rebased to latest main

eero-t · 2024-12-11T17:00:39Z

Rebased to main, and dropped fix for model ID [1].

[1] it's specific to DocSum & FaqGen specific llm-uservice variants. Text generation llm-uservice variant uses different env var name from those (which IMHO is bug, but for code in another repo). I'll add workaround for it in separate PR for DocSum vLLM suport.

Signed-off-by: Eero Tamminen <[email protected]>

For now vLLM replaces just TGI, but as it supports also embedding, also TEI-embed/-rerank may be replaceable later on. Signed-off-by: Eero Tamminen <[email protected]>

Signed-off-by: Eero Tamminen <[email protected]>

- Remove --eager-enforce on hpu to improve performance - Refactor to the upstream docker entrypoint changes Fixes issue opea-project#631. Signed-off-by: Lianhao Lu <[email protected]>

Signed-off-by: Eero Tamminen <[email protected]>

eero-t requested review from yongfengdu and lianhao as code owners November 25, 2024 18:04

eero-t marked this pull request as draft November 25, 2024 18:04

eero-t force-pushed the helm-vllm branch from e4e807e to 8a4cc7d Compare November 25, 2024 18:30

yongfengdu mentioned this pull request Nov 27, 2024

✨ Helm Chart for OpenVINO vLLM #403

Open

3 tasks

eero-t force-pushed the helm-vllm branch from 8a4cc7d to c5741dc Compare November 29, 2024 20:07

eero-t force-pushed the helm-vllm branch from c5741dc to 1c56d39 Compare November 29, 2024 20:13

This was referenced Dec 2, 2024

[Bug] vLLM images missing from Docker Hub opea-project/GenAIComps#961

Closed

[ChatQnA] Remove enforce-eager to enable HPU graphs for better vLLM perf opea-project/GenAIExamples#1210

Merged

eero-t force-pushed the helm-vllm branch from 1c56d39 to e764a60 Compare December 10, 2024 18:43

eero-t changed the title ~~WIP: Add vLLM support to ChatQnA + DocSum Helm charts~~ Add vLLM+HPA support to ChatQnA Helm chart Dec 10, 2024

eero-t force-pushed the helm-vllm branch from e764a60 to 908a4ee Compare December 11, 2024 16:55

eero-t force-pushed the helm-vllm branch from 908a4ee to 89a496e Compare December 11, 2024 18:45

eero-t mentioned this pull request Dec 16, 2024

[Bug] Regression: "opea/vllm-gaudi:latest" container in crash loop opea-project/GenAIComps#1038

Closed

2 tasks

eero-t and others added 6 commits December 17, 2024 18:38

Add monitoring support for the vLLM component

79d067b

Signed-off-by: Eero Tamminen <[email protected]>

Initial vLLM support for ChatQnA

c585fbe

For now vLLM replaces just TGI, but as it supports also embedding, also TEI-embed/-rerank may be replaceable later on. Signed-off-by: Eero Tamminen <[email protected]>

Fix HPA comments in tgi/tei/tererank values files

2cb582e

Signed-off-by: Eero Tamminen <[email protected]>

Add HPA scaling support for ChatQnA / vLLM

de2ee0f

Signed-off-by: Eero Tamminen <[email protected]>

Adapt to latest vllm changes

75f8d2a

- Remove --eager-enforce on hpu to improve performance - Refactor to the upstream docker entrypoint changes Fixes issue opea-project#631. Signed-off-by: Lianhao Lu <[email protected]>

Clean up ChatQnA vLLM Gaudi parameters

43ad885

Signed-off-by: Eero Tamminen <[email protected]>

eero-t force-pushed the helm-vllm branch from 89a496e to 43ad885 Compare December 17, 2024 16:38

eero-t marked this pull request as ready for review December 17, 2024 16:39

eero-t mentioned this pull request Dec 17, 2024

Adapt to latest vllm changes #632

Closed

1 task

lianhao approved these changes Dec 18, 2024

View reviewed changes

yongfengdu approved these changes Dec 18, 2024

View reviewed changes

yongfengdu merged commit baed0b5 into opea-project:main Dec 18, 2024
31 of 47 checks passed

eero-t deleted the helm-vllm branch December 18, 2024 10:13

eero-t mentioned this pull request Dec 18, 2024

Add vLLM support to DocSum Helm chart #649

Merged

1 task

eero-t mentioned this pull request Jan 16, 2025

[Feature] Helm Charts for Txt2Img and SearchQnA. #596

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vLLM+HPA support to ChatQnA Helm chart #610

Add vLLM+HPA support to ChatQnA Helm chart #610

eero-t commented Nov 25, 2024 •

edited by lianhao

Loading

eero-t commented Nov 25, 2024 •

edited

Loading

eero-t commented Nov 26, 2024

eero-t commented Nov 26, 2024

eero-t commented Nov 29, 2024

eero-t commented Dec 2, 2024

eero-t commented Dec 10, 2024

eero-t commented Dec 11, 2024 •

edited

Loading

Add vLLM+HPA support to ChatQnA Helm chart #610

Add vLLM+HPA support to ChatQnA Helm chart #610

Conversation

eero-t commented Nov 25, 2024 • edited by lianhao Loading

Description

Issues

Type of change

New dependencies

Tests

eero-t commented Nov 25, 2024 • edited Loading

eero-t commented Nov 26, 2024

eero-t commented Nov 26, 2024

eero-t commented Nov 29, 2024

eero-t commented Dec 2, 2024

eero-t commented Dec 10, 2024

eero-t commented Dec 11, 2024 • edited Loading

eero-t commented Nov 25, 2024 •

edited by lianhao

Loading

eero-t commented Nov 25, 2024 •

edited

Loading

eero-t commented Dec 11, 2024 •

edited

Loading