Update blog and home page

Signed-off-by: Dan Sun <[email protected]>
kserve · Jun 10, 2024 · 06a7204 · 06a7204
1 parent bd140cb
commit 06a7204
Show file tree

Hide file tree

Showing 4 changed files with 39 additions and 29 deletions.
diff --git a/docs/blog/articles/2024-05-15-KServe-0.13-release.md b/docs/blog/articles/2024-05-15-KServe-0.13-release.md
@@ -50,15 +50,15 @@ Version 0.13 introduces dedicated runtime support for [vLLM](https://docs.vllm.a
 apiVersion: serving.kserve.io/v1beta1
 kind: InferenceService
 metadata:
-  name: huggingface-llama2
+  name: huggingface-llama3
 spec:
   predictor:
     model:
       modelFormat:
         name: huggingface
       args:
-      - --model_name=llama2
-      - --model_id=meta-llama/Llama-2-7b-chat-hf
+      - --model_name=llama3
+      - --model_id=meta-llama/meta-llama-3-8b-instruct
       resources:
         limits:
           cpu: "6"
@@ -70,7 +70,7 @@ spec:
           nvidia.com/gpu: "1"
 ```
 
-See more details in our updated docs to [Deploy the Llama2 model with Hugging Face LLM Serving Runtime](https://kserve.github.io/website/master/modelserving/v1beta1/llm/huggingface/).
+See more details in our updated docs to [Deploy the Llama3 model with Hugging Face LLM Serving Runtime](https://kserve.github.io/website/master/modelserving/v1beta1/llm/huggingface/).
 
 Additionally, if the Hugging Face backend is preferred over vLLM, vLLM auto-mapping can be disabled with the `--backend=huggingface` arg.
 
@@ -108,7 +108,7 @@ This release also includes several enhancements and changes:
 * Removed Seldon Alibi dependency [#3380](https://github.com/kserve/kserve/issues/3380).
 * Removal of conversion webhook from manifests. [#3344](https://github.com/kserve/kserve/issues/3344).
 
-For complete details on the new features and updates, visit our [official release notes](https://github.com/kserve/kserve/releases/tag/v0.13.0-rc0).
+For complete details on the new features and updates, visit our [official release notes](https://github.com/kserve/kserve/releases/tag/v0.13.0).
 
 
 ## Join the community

diff --git a/docs/modelserving/v1beta1/llm/huggingface/README.md b/docs/modelserving/v1beta1/llm/huggingface/README.md
@@ -43,7 +43,7 @@ KServe Hugging Face runtime by default uses vLLM to serve the LLM models for fas
     1. `SAFETENSORS_FAST_GPU` is set by default to improve the model loading performance.
     2. `HF_HUB_DISABLE_TELEMETRY` is set by default to disable the telemetry.
 
-### Perform Model Inference
+#### Perform Model Inference
 
 The first step is to [determine the ingress IP and ports](../../../../get_started/first_isvc.md#4-determine-the-ingress-ip-and-ports) and set `INGRESS_HOST` and `INGRESS_PORT`.
 
@@ -89,15 +89,15 @@ supports the OpenAI `/v1/completions` and `/v1/chat/completions` endpoints for i
     apiVersion: serving.kserve.io/v1beta1
     kind: InferenceService
     metadata:
-      name: huggingface-llama3
+      name: huggingface-t5
     spec:
       predictor:
         model:
           modelFormat:
             name: huggingface
           args:
-          - --model_name=llama3
-          - --model_id=meta-llama/meta-llama-3-8b-instruct
+          - --model_name=t5
+          - --model_id=google-t5/t5-small
           - --backend=huggingface
           resources:
             limits:
@@ -111,6 +111,30 @@ supports the OpenAI `/v1/completions` and `/v1/chat/completions` endpoints for i
     EOF
     ```
 
+#### Perform Model Inference
+
+The first step is to [determine the ingress IP and ports](../../../../get_started/first_isvc.md#4-determine-the-ingress-ip-and-ports) and set `INGRESS_HOST` and `INGRESS_PORT`.
+
+```bash
+MODEL_NAME=t5
+SERVICE_HOSTNAME=$(kubectl get inferenceservice huggingface-t5 -o jsonpath='{.status.url}' | cut -d "/" -f 3)
+```
+
+KServe Hugging Face vLLM runtime supports the OpenAI `/v1/completions` and `/v1/chat/completions` endpoints for inference
+
+Sample OpenAI Completions request:
+
+```bash
+curl -H "content-type:application/json" -H "Host: ${SERVICE_HOSTNAME}" -v http://${INGRESS_HOST}:${INGRESS_PORT}/openai/v1/completions -d '{"model": "${MODEL_NAME}", "prompt": "<prompt>", "stream":false, "max_tokens": 30 }'
+
+```
+!!! success "Expected Output"
+
+  ```{ .json .no-copy }
+  ```
+
+
+
 ### Hugging Face Runtime Arguments
 
 Below, you can find an explanation of command line arguments which are supported for Hugging Face runtime. [vLLM backend engine arguments](https://docs.vllm.ai/en/latest/models/engine_args.html) can also be specified on the command line argument which is parsed by the Hugging Face runtime.

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -103,7 +103,7 @@ nav:
           - Debugging guide: developer/debug.md
     - Blog:
           - Releases:
-            - KServe 0.13 Release: blog/articles/2023-10-08-KServe-0.13-release.md
+            - KServe 0.13 Release: blog/articles/2024-05-15-KServe-0.13-release.md
             - KServe 0.11 Release: blog/articles/2023-10-08-KServe-0.11-release.md
             - KServe 0.10 Release: blog/articles/2023-02-05-KServe-0.10-release.md
             - KServe 0.9 Release: blog/articles/2022-07-21-KServe-0.9-release.md

diff --git a/overrides/home.html b/overrides/home.html
@@ -134,13 +134,6 @@ <h5 class="org-card--heading" id="org-card--heading">
 
           <div id="" class="org-card--front" aria-labelledby="org-card-heading">
 
-            <div class="org-card--picture">
-              <a href="modelserving/explainer/explainer">
-                <img src="./images/Explaination.png" alt="explainer">
-              </a>
-
-            </div>
-
             <div class="org-card--body">
               <div class="org-card--body-heading">
                 <h5 class="org-card--heading" id="org-card--heading">
@@ -149,8 +142,8 @@ <h5 class="org-card--heading" id="org-card--heading">
               </div>
               <div class="org-card--body-content">
                 <div class="org-card--body-content-wrapper">
-                  Provides ML model inspection and interpretation, KServe integrates <a href="https://www.seldon.io/solutions/open-source-projects/alibi-explain/">Alibi</a>, <a href="https://aix360.mybluemix.net/">AI Explainability 360</a>,
-                  <a href="https://captum.ai/">Captum</a> to help explain the predictions and gauge the confidence of those predictions.
+                  Provides ML model inspection and interpretation, KServe integrates <a href="https://captum.ai/">Captum</a> 
+		  to help explain the predictions and gauge the confidence of those predictions.
                 </div>
               </div>
             </div>
@@ -160,23 +153,16 @@ <h5 class="org-card--heading" id="org-card--heading">
 
           <div id="" class="org-card--front" aria-labelledby="org-card-heading">
 
-            <div class="org-card--picture">
-              <a href="modelserving/detect/alibi_detect/alibi_detect/">
-                <img src="./images/Monitoring.svg" alt="model monitoring">
-              </a>
-
-            </div>
-
             <div class="org-card--body">
               <div class="org-card--body-heading">
                 <h5 class="org-card--heading" id="org-card--heading">
-                  <a href="modelserving/detect/alibi_detect/alibi_detect/">Model Monitoring</a>
+                  <a href="modelserving/detect/aif/germancredit">Model Monitoring</a>
                 </h5>
               </div>
               <div class="org-card--body-content">
                 <div class="org-card--body-content-wrapper">
-                  Enables payload logging, outlier, adversarial and drift detection, KServe integrates <a href="https://docs.seldon.io/projects/alibi-detect/en/stable/index.html">Alibi-detect</a>, <a href="https://aif360.mybluemix.net/">AI
-                  Fairness 360</a>, <a href="https://github.com/Trusted-AI/adversarial-robustness-toolbox">Adversarial Robustness Toolbox (ART)</a> to help monitor the ML models on production.
+                  Enables payload logging, outlier, adversarial and drift detection, KServe integrates <a href="https://aif360.mybluemix.net/">AI Fairness 360</a>,
+		  <a href="https://github.com/Trusted-AI/adversarial-robustness-toolbox">Adversarial Robustness Toolbox (ART)</a> to help monitor the ML models on production.
                 </div>
               </div>
             </div>