Skip to content

Commit

Permalink
Deployed 95c9402 to master with MkDocs 1.6.0 and mike 2.1.1
Browse files Browse the repository at this point in the history
  • Loading branch information
github-actions[bot] committed Jun 10, 2024
1 parent 2a0948d commit dc3b40e
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 3 deletions.
20 changes: 18 additions & 2 deletions master/modelserving/v1beta1/llm/huggingface/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -1303,12 +1303,28 @@ <h4 id="perform-model-inference_1">Perform Model Inference<a class="headerlink"
</code></pre></div>
<p>KServe Hugging Face vLLM runtime supports the OpenAI <code>/v1/completions</code> and <code>/v1/chat/completions</code> endpoints for inference</p>
<p>Sample OpenAI Completions request:</p>
<div class="highlight"><pre><span></span><code>curl<span class="w"> </span>-H<span class="w"> </span><span class="s2">"content-type:application/json"</span><span class="w"> </span>-H<span class="w"> </span><span class="s2">"Host: </span><span class="si">${</span><span class="nv">SERVICE_HOSTNAME</span><span class="si">}</span><span class="s2">"</span><span class="w"> </span>-v<span class="w"> </span>http://<span class="si">${</span><span class="nv">INGRESS_HOST</span><span class="si">}</span>:<span class="si">${</span><span class="nv">INGRESS_PORT</span><span class="si">}</span>/openai/v1/completions<span class="w"> </span>-d<span class="w"> </span><span class="s1">'{"model": "${MODEL_NAME}", "prompt": "&lt;prompt&gt;", "stream":false, "max_tokens": 30 }'</span>
<div class="highlight"><pre><span></span><code>curl<span class="w"> </span>-H<span class="w"> </span><span class="s2">"content-type:application/json"</span><span class="w"> </span>-H<span class="w"> </span><span class="s2">"Host: </span><span class="si">${</span><span class="nv">SERVICE_HOSTNAME</span><span class="si">}</span><span class="s2">"</span><span class="w"> </span>-v<span class="w"> </span>http://<span class="si">${</span><span class="nv">INGRESS_HOST</span><span class="si">}</span>:<span class="si">${</span><span class="nv">INGRESS_PORT</span><span class="si">}</span>/openai/v1/completions<span class="w"> </span>-d<span class="w"> </span><span class="s1">'{"model": "${MODEL_NAME}", "prompt": "translate English to German: The house is wonderful.", "stream":false, "max_tokens": 30 }'</span>
</code></pre></div>
<div class="admonition success">
<p class="admonition-title">Expected Output</p>
</div>
<div class="no-copy highlight"><pre><span></span><code><span class="p">{</span><span class="nt">"id"</span><span class="p">:</span><span class="s2">"de53f527-9cb9-47a5-9673-43d180b704f2"</span><span class="p">,</span><span class="nt">"choices"</span><span class="p">:[{</span><span class="nt">"finish_reason"</span><span class="p">:</span><span class="s2">"length"</span><span class="p">,</span><span class="nt">"index"</span><span class="p">:</span><span class="mi">0</span><span class="p">,</span><span class="nt">"logprobs"</span><span class="p">:</span><span class="kc">null</span><span class="p">,</span><span class="nt">"text"</span><span class="p">:</span><span class="s2">"Das Haus ist wunderbar."</span><span class="p">}],</span><span class="nt">"created"</span><span class="p">:</span><span class="mi">1717998661</span><span class="p">,</span><span class="nt">"model"</span><span class="p">:</span><span class="s2">"t5"</span><span class="p">,</span><span class="nt">"system_fingerprint"</span><span class="p">:</span><span class="kc">null</span><span class="p">,</span><span class="nt">"object"</span><span class="p">:</span><span class="s2">"text_completion"</span><span class="p">,</span><span class="nt">"usage"</span><span class="p">:{</span><span class="nt">"completion_tokens"</span><span class="p">:</span><span class="mi">7</span><span class="p">,</span><span class="nt">"prompt_tokens"</span><span class="p">:</span><span class="mi">11</span><span class="p">,</span><span class="nt">"total_tokens"</span><span class="p">:</span><span class="mi">18</span><span class="p">}}</span>
</code></pre></div>
<p>Sample OpenAI Completions streaming request:</p>
<div class="highlight"><pre><span></span><code>curl<span class="w"> </span>-H<span class="w"> </span><span class="s2">"content-type:application/json"</span><span class="w"> </span>-H<span class="w"> </span><span class="s2">"Host: </span><span class="si">${</span><span class="nv">SERVICE_HOSTNAME</span><span class="si">}</span><span class="s2">"</span><span class="w"> </span>-v<span class="w"> </span>http://<span class="si">${</span><span class="nv">INGRESS_HOST</span><span class="si">}</span>:<span class="si">${</span><span class="nv">INGRESS_PORT</span><span class="si">}</span>/openai/v1/completions<span class="w"> </span>-d<span class="w"> </span><span class="s1">'{"model": "${MODEL_NAME}", "prompt": "translate English to German: The house is wonderful.", "stream":true, "max_tokens": 30 }'</span>
</code></pre></div>
<div class="admonition success">
<p class="admonition-title">Expected Output</p>
</div>
<div class="no-copy highlight"><pre><span></span><code>
<div class="no-copy highlight"><pre><span></span><code><span class="err">da</span><span class="kc">ta</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="nt">"id"</span><span class="p">:</span><span class="s2">"70bb8bea-57d5-4b34-aade-da38970c917c"</span><span class="p">,</span><span class="nt">"choices"</span><span class="p">:[{</span><span class="nt">"finish_reason"</span><span class="p">:</span><span class="s2">"length"</span><span class="p">,</span><span class="nt">"index"</span><span class="p">:</span><span class="mi">0</span><span class="p">,</span><span class="nt">"logprobs"</span><span class="p">:</span><span class="kc">null</span><span class="p">,</span><span class="nt">"text"</span><span class="p">:</span><span class="s2">"Das "</span><span class="p">}],</span><span class="nt">"created"</span><span class="p">:</span><span class="mi">1717998767</span><span class="p">,</span><span class="nt">"model"</span><span class="p">:</span><span class="s2">"t5"</span><span class="p">,</span><span class="nt">"system_fingerprint"</span><span class="p">:</span><span class="kc">null</span><span class="p">,</span><span class="nt">"object"</span><span class="p">:</span><span class="s2">"text_completion"</span><span class="p">,</span><span class="nt">"usage"</span><span class="p">:</span><span class="kc">null</span><span class="p">}</span>

<span class="err">da</span><span class="kc">ta</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="nt">"id"</span><span class="p">:</span><span class="s2">"70bb8bea-57d5-4b34-aade-da38970c917c"</span><span class="p">,</span><span class="nt">"choices"</span><span class="p">:[{</span><span class="nt">"finish_reason"</span><span class="p">:</span><span class="s2">"length"</span><span class="p">,</span><span class="nt">"index"</span><span class="p">:</span><span class="mi">0</span><span class="p">,</span><span class="nt">"logprobs"</span><span class="p">:</span><span class="kc">null</span><span class="p">,</span><span class="nt">"text"</span><span class="p">:</span><span class="s2">"Haus "</span><span class="p">}],</span><span class="nt">"created"</span><span class="p">:</span><span class="mi">1717998767</span><span class="p">,</span><span class="nt">"model"</span><span class="p">:</span><span class="s2">"t5"</span><span class="p">,</span><span class="nt">"system_fingerprint"</span><span class="p">:</span><span class="kc">null</span><span class="p">,</span><span class="nt">"object"</span><span class="p">:</span><span class="s2">"text_completion"</span><span class="p">,</span><span class="nt">"usage"</span><span class="p">:</span><span class="kc">null</span><span class="p">}</span>

<span class="err">da</span><span class="kc">ta</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="nt">"id"</span><span class="p">:</span><span class="s2">"70bb8bea-57d5-4b34-aade-da38970c917c"</span><span class="p">,</span><span class="nt">"choices"</span><span class="p">:[{</span><span class="nt">"finish_reason"</span><span class="p">:</span><span class="s2">"length"</span><span class="p">,</span><span class="nt">"index"</span><span class="p">:</span><span class="mi">0</span><span class="p">,</span><span class="nt">"logprobs"</span><span class="p">:</span><span class="kc">null</span><span class="p">,</span><span class="nt">"text"</span><span class="p">:</span><span class="s2">"ist "</span><span class="p">}],</span><span class="nt">"created"</span><span class="p">:</span><span class="mi">1717998767</span><span class="p">,</span><span class="nt">"model"</span><span class="p">:</span><span class="s2">"t5"</span><span class="p">,</span><span class="nt">"system_fingerprint"</span><span class="p">:</span><span class="kc">null</span><span class="p">,</span><span class="nt">"object"</span><span class="p">:</span><span class="s2">"text_completion"</span><span class="p">,</span><span class="nt">"usage"</span><span class="p">:</span><span class="kc">null</span><span class="p">}</span>

<span class="err">da</span><span class="kc">ta</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="nt">"id"</span><span class="p">:</span><span class="s2">"70bb8bea-57d5-4b34-aade-da38970c917c"</span><span class="p">,</span><span class="nt">"choices"</span><span class="p">:[{</span><span class="nt">"finish_reason"</span><span class="p">:</span><span class="s2">"length"</span><span class="p">,</span><span class="nt">"index"</span><span class="p">:</span><span class="mi">0</span><span class="p">,</span><span class="nt">"logprobs"</span><span class="p">:</span><span class="kc">null</span><span class="p">,</span><span class="nt">"text"</span><span class="p">:</span><span class="s2">"wunderbar.&lt;/s&gt;"</span><span class="p">}],</span><span class="nt">"created"</span><span class="p">:</span><span class="mi">1717998767</span><span class="p">,</span><span class="nt">"model"</span><span class="p">:</span><span class="s2">"t5"</span><span class="p">,</span><span class="nt">"system_fingerprint"</span><span class="p">:</span><span class="kc">null</span><span class="p">,</span><span class="nt">"object"</span><span class="p">:</span><span class="s2">"text_completion"</span><span class="p">,</span><span class="nt">"usage"</span><span class="p">:</span><span class="kc">null</span><span class="p">}</span>

<span class="err">da</span><span class="kc">ta</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="err">DONE</span><span class="p">]</span>
</code></pre></div>
<h3 id="hugging-face-runtime-arguments">Hugging Face Runtime Arguments<a class="headerlink" href="#hugging-face-runtime-arguments" title="Permanent link"></a></h3>
<p>Below, you can find an explanation of command line arguments which are supported for Hugging Face runtime. <a href="https://docs.vllm.ai/en/latest/models/engine_args.html">vLLM backend engine arguments</a> can also be specified on the command line argument which is parsed by the Hugging Face runtime.</p>
Expand Down
2 changes: 1 addition & 1 deletion master/search/search_index.json

Large diffs are not rendered by default.

0 comments on commit dc3b40e

Please sign in to comment.