Skip to content

Commit

Permalink
Update blog and home page
Browse files Browse the repository at this point in the history
Signed-off-by: Dan Sun <[email protected]>
  • Loading branch information
yuzisun committed Jun 10, 2024
1 parent bd140cb commit 06a7204
Show file tree
Hide file tree
Showing 4 changed files with 39 additions and 29 deletions.
10 changes: 5 additions & 5 deletions docs/blog/articles/2024-05-15-KServe-0.13-release.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,15 +50,15 @@ Version 0.13 introduces dedicated runtime support for [vLLM](https://docs.vllm.a
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: huggingface-llama2
name: huggingface-llama3
spec:
predictor:
model:
modelFormat:
name: huggingface
args:
- --model_name=llama2
- --model_id=meta-llama/Llama-2-7b-chat-hf
- --model_name=llama3
- --model_id=meta-llama/meta-llama-3-8b-instruct
resources:
limits:
cpu: "6"
Expand All @@ -70,7 +70,7 @@ spec:
nvidia.com/gpu: "1"
```
See more details in our updated docs to [Deploy the Llama2 model with Hugging Face LLM Serving Runtime](https://kserve.github.io/website/master/modelserving/v1beta1/llm/huggingface/).
See more details in our updated docs to [Deploy the Llama3 model with Hugging Face LLM Serving Runtime](https://kserve.github.io/website/master/modelserving/v1beta1/llm/huggingface/).
Additionally, if the Hugging Face backend is preferred over vLLM, vLLM auto-mapping can be disabled with the `--backend=huggingface` arg.

Expand Down Expand Up @@ -108,7 +108,7 @@ This release also includes several enhancements and changes:
* Removed Seldon Alibi dependency [#3380](https://github.com/kserve/kserve/issues/3380).
* Removal of conversion webhook from manifests. [#3344](https://github.com/kserve/kserve/issues/3344).

For complete details on the new features and updates, visit our [official release notes](https://github.com/kserve/kserve/releases/tag/v0.13.0-rc0).
For complete details on the new features and updates, visit our [official release notes](https://github.com/kserve/kserve/releases/tag/v0.13.0).


## Join the community
Expand Down
32 changes: 28 additions & 4 deletions docs/modelserving/v1beta1/llm/huggingface/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ KServe Hugging Face runtime by default uses vLLM to serve the LLM models for fas
1. `SAFETENSORS_FAST_GPU` is set by default to improve the model loading performance.
2. `HF_HUB_DISABLE_TELEMETRY` is set by default to disable the telemetry.

### Perform Model Inference
#### Perform Model Inference

The first step is to [determine the ingress IP and ports](../../../../get_started/first_isvc.md#4-determine-the-ingress-ip-and-ports) and set `INGRESS_HOST` and `INGRESS_PORT`.

Expand Down Expand Up @@ -89,15 +89,15 @@ supports the OpenAI `/v1/completions` and `/v1/chat/completions` endpoints for i
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: huggingface-llama3
name: huggingface-t5
spec:
predictor:
model:
modelFormat:
name: huggingface
args:
- --model_name=llama3
- --model_id=meta-llama/meta-llama-3-8b-instruct
- --model_name=t5
- --model_id=google-t5/t5-small
- --backend=huggingface
resources:
limits:
Expand All @@ -111,6 +111,30 @@ supports the OpenAI `/v1/completions` and `/v1/chat/completions` endpoints for i
EOF
```

#### Perform Model Inference

The first step is to [determine the ingress IP and ports](../../../../get_started/first_isvc.md#4-determine-the-ingress-ip-and-ports) and set `INGRESS_HOST` and `INGRESS_PORT`.

```bash
MODEL_NAME=t5
SERVICE_HOSTNAME=$(kubectl get inferenceservice huggingface-t5 -o jsonpath='{.status.url}' | cut -d "/" -f 3)
```

KServe Hugging Face vLLM runtime supports the OpenAI `/v1/completions` and `/v1/chat/completions` endpoints for inference

Sample OpenAI Completions request:

```bash
curl -H "content-type:application/json" -H "Host: ${SERVICE_HOSTNAME}" -v http://${INGRESS_HOST}:${INGRESS_PORT}/openai/v1/completions -d '{"model": "${MODEL_NAME}", "prompt": "<prompt>", "stream":false, "max_tokens": 30 }'

```
!!! success "Expected Output"

```{ .json .no-copy }
```



### Hugging Face Runtime Arguments

Below, you can find an explanation of command line arguments which are supported for Hugging Face runtime. [vLLM backend engine arguments](https://docs.vllm.ai/en/latest/models/engine_args.html) can also be specified on the command line argument which is parsed by the Hugging Face runtime.
Expand Down
2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ nav:
- Debugging guide: developer/debug.md
- Blog:
- Releases:
- KServe 0.13 Release: blog/articles/2023-10-08-KServe-0.13-release.md
- KServe 0.13 Release: blog/articles/2024-05-15-KServe-0.13-release.md
- KServe 0.11 Release: blog/articles/2023-10-08-KServe-0.11-release.md
- KServe 0.10 Release: blog/articles/2023-02-05-KServe-0.10-release.md
- KServe 0.9 Release: blog/articles/2022-07-21-KServe-0.9-release.md
Expand Down
24 changes: 5 additions & 19 deletions overrides/home.html
Original file line number Diff line number Diff line change
Expand Up @@ -134,13 +134,6 @@ <h5 class="org-card--heading" id="org-card--heading">

<div id="" class="org-card--front" aria-labelledby="org-card-heading">

<div class="org-card--picture">
<a href="modelserving/explainer/explainer">
<img src="./images/Explaination.png" alt="explainer">
</a>

</div>

<div class="org-card--body">
<div class="org-card--body-heading">
<h5 class="org-card--heading" id="org-card--heading">
Expand All @@ -149,8 +142,8 @@ <h5 class="org-card--heading" id="org-card--heading">
</div>
<div class="org-card--body-content">
<div class="org-card--body-content-wrapper">
Provides ML model inspection and interpretation, KServe integrates <a href="https://www.seldon.io/solutions/open-source-projects/alibi-explain/">Alibi</a>, <a href="https://aix360.mybluemix.net/">AI Explainability 360</a>,
<a href="https://captum.ai/">Captum</a> to help explain the predictions and gauge the confidence of those predictions.
Provides ML model inspection and interpretation, KServe integrates <a href="https://captum.ai/">Captum</a>
to help explain the predictions and gauge the confidence of those predictions.
</div>
</div>
</div>
Expand All @@ -160,23 +153,16 @@ <h5 class="org-card--heading" id="org-card--heading">

<div id="" class="org-card--front" aria-labelledby="org-card-heading">

<div class="org-card--picture">
<a href="modelserving/detect/alibi_detect/alibi_detect/">
<img src="./images/Monitoring.svg" alt="model monitoring">
</a>

</div>

<div class="org-card--body">
<div class="org-card--body-heading">
<h5 class="org-card--heading" id="org-card--heading">
<a href="modelserving/detect/alibi_detect/alibi_detect/">Model Monitoring</a>
<a href="modelserving/detect/aif/germancredit">Model Monitoring</a>
</h5>
</div>
<div class="org-card--body-content">
<div class="org-card--body-content-wrapper">
Enables payload logging, outlier, adversarial and drift detection, KServe integrates <a href="https://docs.seldon.io/projects/alibi-detect/en/stable/index.html">Alibi-detect</a>, <a href="https://aif360.mybluemix.net/">AI
Fairness 360</a>, <a href="https://github.com/Trusted-AI/adversarial-robustness-toolbox">Adversarial Robustness Toolbox (ART)</a> to help monitor the ML models on production.
Enables payload logging, outlier, adversarial and drift detection, KServe integrates <a href="https://aif360.mybluemix.net/">AI Fairness 360</a>,
<a href="https://github.com/Trusted-AI/adversarial-robustness-toolbox">Adversarial Robustness Toolbox (ART)</a> to help monitor the ML models on production.
</div>
</div>
</div>
Expand Down

0 comments on commit 06a7204

Please sign in to comment.