-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vLLM+HPA support to ChatQnA Helm chart #610
Conversation
Setting as draft. I have tested that DocSum with Gaudi vLLM works, and that ChatQnA Helm chart can be installed, but due to v1.1 image pulls currently taking so long in my test node, I haven't been able test ChatQnA with Gaudi vLLM properly yet. vLLM CPU version testing would also be needed before merging this (I'm hoping somebody else here could check at least DocSum with CPU vLLM). |
CI issues:
|
This overlaps partly with #403. |
Added HPA support for ChatQnA / vLLM. |
This bug for
Because neither that CI file, nor
So CI either invents that model from thin air, or is using wrong CI file ( |
Changes:
|
Rebased to [1] it's specific to DocSum & FaqGen specific |
Signed-off-by: Eero Tamminen <[email protected]>
For now vLLM replaces just TGI, but as it supports also embedding, also TEI-embed/-rerank may be replaceable later on. Signed-off-by: Eero Tamminen <[email protected]>
Signed-off-by: Eero Tamminen <[email protected]>
Signed-off-by: Eero Tamminen <[email protected]>
- Remove --eager-enforce on hpu to improve performance - Refactor to the upstream docker entrypoint changes Fixes issue opea-project#631. Signed-off-by: Lianhao Lu <[email protected]>
Signed-off-by: Eero Tamminen <[email protected]>
Description
Add vLLM + HPA support for ChatQnA Helm charts.
Similarly to how it's already done in Agent component, there's
tgi.enabled
andvllm.enabled
flags for selecting which LLM will be used.Note: ChatQnA does not yet support using vLLM for embedding & reranking [1], so HF TEI is still used for those.
[1] opea-project/GenAIExamples#1237
Issues
Fixes #608 partially.
Fixes #631.
Type of change
New dependencies
n/a
Tests
Manual testing on top of
main
HEAD / v1.1 images.