Skip to content

Commit

Permalink
Example for checking that vLLM metrics work too
Browse files Browse the repository at this point in the history
Signed-off-by: Eero Tamminen <[email protected]>
  • Loading branch information
eero-t committed Nov 25, 2024
1 parent 8b2cd79 commit 29a0e29
Showing 1 changed file with 11 additions and 2 deletions.
13 changes: 11 additions & 2 deletions helm-charts/monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,12 +75,21 @@ $ prom_url=http://$(kubectl -n $prom_ns get -o jsonpath="{.spec.clusterIP}:{.spe
$ curl --no-progress-meter $prom_url/metrics | grep scrape_pool_targets.*$chart
```

Check that Prometheus metrics from TGI inference component are available:
Then check that Prometheus metrics from a relevant LLM inferencing service are available.

For vLLM:

```console
$ curl --no-progress-meter $prom_url/api/v1/query? \
--data-urlencode 'query=vllm:cache_config_info{service="'$chart'-vllm"}' | jq
```

Or TGI:

```console
$ curl --no-progress-meter $prom_url/api/v1/query? \
--data-urlencode 'query=tgi_queue_size{service="'$chart'-tgi"}' | jq
```

**NOTE**: services provide metrics only after they've processed their first request.
And reranking service will be used only after context data has been uploaded!
And ChatQnA uses (TEI) reranking service only after query context data has been uploaded!

0 comments on commit 29a0e29

Please sign in to comment.