Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]AudioQnA : whisper service not coming up #1474

Open
3 of 8 tasks
ShankarRIntel opened this issue Jan 27, 2025 · 6 comments
Open
3 of 8 tasks

[Bug]AudioQnA : whisper service not coming up #1474

ShankarRIntel opened this issue Jan 27, 2025 · 6 comments
Assignees
Labels
bug Something isn't working invalid This doesn't seem right

Comments

@ShankarRIntel
Copy link

Priority

P1-Stopper

OS type

Ubuntu

Hardware type

Xeon-GNR

Installation method

  • Pull docker images from hub.docker.com
  • Build docker images from source
  • Other

Deploy method

  • Docker
  • Docker Compose
  • Kubernetes Helm Charts
  • Kubernetes GMC
  • Other

Running nodes

Single Node

What's the version?

latest

Description

The whisper service does not come up when running the AudioQnA sample. ls

audioqna_logs.err.txt
audioqna_ps.err.txt

Reproduce steps

audioqna_logs.err.txt
audioqna_ps.err.txt

Raw log

Attachments

audioqna_logs.err.txt
audioqna_ps.err.txt

@ShankarRIntel ShankarRIntel added the bug Something isn't working label Jan 27, 2025
@chensuyue
Copy link
Collaborator

I can't reproduce this issue.

@chensuyue
Copy link
Collaborator

@ShankarRIntel
Copy link
Author

ShankarRIntel commented Jan 30, 2025 via email

@xiguiw
Copy link
Collaborator

xiguiw commented Feb 5, 2025

Simply update you code to latest, set HUGGINGFACEHUB_API_TOKEN,

#cd /home/jenkins/xigui/GenAIExamples/
#git pull
#cd AudioQnA/docker_compose/intel/hpu/gaudi
#export  HUGGINGFACEHUB_API_TOKEN=your_HUGGINGFACEHUB_API_TOKEN
#source set_env.sh
#docker compose up -d 

That's all.

ub.com/user-attachments/files/18552508/audioqna_ps.err.txt)

Raw log

Attachments

audioqna_logs.err.txt audioqna_ps.err.txt

@ShankarRIntel

Here is the contents of audioqna_ps.err.txt
It shows the server is starting ...

CONTAINER ID   IMAGE                                 COMMAND                  CREATED         STATUS                            PORTS                                   NAMES
9269c3bae5dc   ghcr.io/huggingface/tgi-gaudi:2.3.1   "/tgi-entrypoint.sh …"   5 minutes ago   Up 5 minutes (health: starting)   0.0.0.0:3006->80/tcp, :::3006->80/tcp   tgi-gaudi-server

You can check the log of the container by cotainaer ID (9269c3bae5dc in your case)

docker logs -t  9269c3bae5dc   

Here is what I watched at my side:

docker serever is starting.

#docker ps -a | grep tgi-gaudi-server
293199c809fe   ghcr.io/huggingface/tgi-gaudi:2.3.1                                                         "/tgi-entrypoint.sh …"   5 minutes ago   Up 5 minutes (health: starting)   0.0.0.0:3006->80/tcp, [::]:3006->80/tcp                                                                  tgi-gaudi-server

Chech the log, it is downloading the models.

#docker logs -t 293199c809fe
...
2025-02-05T02:52:27.846316128Z 2025-02-05T02:52:27.845862Z  INFO hf_hub: Token file not found "/data/token"
2025-02-05T02:52:55.996872211Z 2025-02-05T02:52:55.996540Z  WARN text_generation_launcher::gpu: Cannot determine GPU compute capability: ModuleNotFoundError: No module named 'torch'
2025-02-05T02:52:55.996926754Z 2025-02-05T02:52:55.996561Z  INFO text_generation_launcher: Using attention default - Prefix caching true
2025-02-05T02:52:55.996938110Z 2025-02-05T02:52:55.996567Z  INFO text_generation_launcher: Default `max_batch_prefill_tokens` to 1074
2025-02-05T02:52:55.996947148Z 2025-02-05T02:52:55.996569Z  INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32]
2025-02-05T02:52:55.996955817Z 2025-02-05T02:52:55.996654Z  INFO download: text_generation_launcher: Starting check and download process for Intel/neural-chat-7b-v3-3
2025-02-05T02:53:02.132405187Z 2025-02-05T02:53:02.132029Z  INFO text_generation_launcher: Download file: model-00001-of-00002.safetensors
2025-02-05T02:56:58.813755644Z 2025-02-05T02:56:58.813421Z  INFO text_generation_launcher: Downloaded /data/hub/models--Intel--neural-chat-7b-v3-3/snapshots/7506dfc5fb325a8a8e0c4f9a6a001671833e5b8e/model-00001-of-00002.safetensors in 0:03:56.
2025-02-05T02:56:58.813818135Z 2025-02-05T02:56:58.813699Z  INFO text_generation_launcher: Download: [1/2] -- ETA: 0:03:56
2025-02-05T02:56:58.814853538Z 2025-02-05T02:56:58.814787Z  INFO text_generation_launcher: Download file: model-00002-of-00002.safetensors

After minutes (depends on the model size and you network bandwidth) until the model download completed
Everything is OK.

docker ps -a | grep tgi-gaudi-server
293199c809fe   ghcr.io/huggingface/tgi-gaudi:2.3.1                                                         "/tgi-entrypoint.sh …"   7 minutes ago   Up 7 minutes (healthy)   0.0.0.0:3006->80/tcp, [::]:3006->80/tcp                                                                  tgi-gaudi-server

check the log again

#docker logs -t 293199c809fe
2025-02-05T02:52:27.846011961Z 2025-02-05T02:52:27.845737Z  INFO text_generation_launcher: Args {
2025-02-05T02:52:27.846072310Z     model_id: "Intel/neural-chat-7b-v3-3",
2025-02-05T02:52:27.846078586Z     revision: None,
2025-02-05T02:52:27.846083032Z     validation_workers: 2,
2025-02-05T02:52:27.846087470Z     sharded: None,
2025-02-05T02:52:27.846091785Z     num_shard: None,
2025-02-05T02:52:27.846095847Z     quantize: None,
2025-02-05T02:52:27.846100006Z     speculate: None,
2025-02-05T02:52:27.846104196Z     dtype: Some(
2025-02-05T02:52:27.846108566Z         BFloat16,
2025-02-05T02:52:27.846112845Z     ),
2025-02-05T02:52:27.846116906Z     trust_remote_code: false,
2025-02-05T02:52:27.846121323Z     max_concurrent_requests: 128,
2025-02-05T02:52:27.846125639Z     max_best_of: 2,
2025-02-05T02:52:27.846129807Z     max_stop_sequences: 4,
2025-02-05T02:52:27.846133850Z     max_top_n_tokens: 5,
2025-02-05T02:52:27.846138015Z     max_input_tokens: None,
2025-02-05T02:52:27.846142202Z     max_input_length: Some(
2025-02-05T02:52:27.846146301Z         1024,
2025-02-05T02:52:27.846150442Z     ),
2025-02-05T02:52:27.846154585Z     max_total_tokens: Some(
2025-02-05T02:52:27.846158678Z         2048,
2025-02-05T02:52:27.846162788Z     ),
2025-02-05T02:52:27.846166725Z     waiting_served_ratio: 0.3,
2025-02-05T02:52:27.846170850Z     max_batch_prefill_tokens: None,
2025-02-05T02:52:27.846175438Z     max_batch_total_tokens: None,
2025-02-05T02:52:27.846179490Z     max_waiting_tokens: 20,
2025-02-05T02:52:27.846183533Z     max_batch_size: None,
2025-02-05T02:52:27.846187703Z     cuda_graphs: None,
2025-02-05T02:52:27.846191743Z     hostname: "293199c809fe",
2025-02-05T02:52:27.846195993Z     port: 80,
2025-02-05T02:52:27.846200167Z     shard_uds_path: "/tmp/text-generation-server",
2025-02-05T02:52:27.846204509Z     master_addr: "localhost",
2025-02-05T02:52:27.846208745Z     master_port: 29500,
2025-02-05T02:52:27.846212850Z     huggingface_hub_cache: None,
2025-02-05T02:52:27.846217016Z     weights_cache_override: None,
2025-02-05T02:52:27.846221199Z     disable_custom_kernels: false,
2025-02-05T02:52:27.846225409Z     cuda_memory_fraction: 1.0,
2025-02-05T02:52:27.846229527Z     rope_scaling: None,
2025-02-05T02:52:27.846233557Z     rope_factor: None,
2025-02-05T02:52:27.846237649Z     json_output: false,
2025-02-05T02:52:27.846241638Z     otlp_endpoint: None,
2025-02-05T02:52:27.846252893Z     otlp_service_name: "text-generation-inference.router",
2025-02-05T02:52:27.846257704Z     cors_allow_origin: [],
2025-02-05T02:52:27.846262086Z     api_key: None,
2025-02-05T02:52:27.846266107Z     watermark_gamma: None,
2025-02-05T02:52:27.846270140Z     watermark_delta: None,
2025-02-05T02:52:27.846274365Z     ngrok: false,
2025-02-05T02:52:27.846278375Z     ngrok_authtoken: None,
2025-02-05T02:52:27.846282380Z     ngrok_edge: None,
2025-02-05T02:52:27.846286572Z     tokenizer_config_path: None,
2025-02-05T02:52:27.846290806Z     disable_grammar_support: false,
2025-02-05T02:52:27.846294994Z     env: false,
2025-02-05T02:52:27.846299431Z     max_client_batch_size: 4,
2025-02-05T02:52:27.846303801Z     lora_adapters: None,
2025-02-05T02:52:27.846307938Z     usage_stats: On,
2025-02-05T02:52:27.846312034Z }
2025-02-05T02:52:27.846316128Z 2025-02-05T02:52:27.845862Z  INFO hf_hub: Token file not found "/data/token"
2025-02-05T02:52:55.996872211Z 2025-02-05T02:52:55.996540Z  WARN text_generation_launcher::gpu: Cannot determine GPU compute capability: ModuleNotFoundError: No module named 'torch'
2025-02-05T02:52:55.996926754Z 2025-02-05T02:52:55.996561Z  INFO text_generation_launcher: Using attention default - Prefix caching true
2025-02-05T02:52:55.996938110Z 2025-02-05T02:52:55.996567Z  INFO text_generation_launcher: Default `max_batch_prefill_tokens` to 1074
2025-02-05T02:52:55.996947148Z 2025-02-05T02:52:55.996569Z  INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32]
2025-02-05T02:52:55.996955817Z 2025-02-05T02:52:55.996654Z  INFO download: text_generation_launcher: Starting check and download process for Intel/neural-chat-7b-v3-3
2025-02-05T02:53:02.132405187Z 2025-02-05T02:53:02.132029Z  INFO text_generation_launcher: Download file: model-00001-of-00002.safetensors
2025-02-05T02:56:58.813755644Z 2025-02-05T02:56:58.813421Z  INFO text_generation_launcher: Downloaded /data/hub/models--Intel--neural-chat-7b-v3-3/snapshots/7506dfc5fb325a8a8e0c4f9a6a001671833e5b8e/model-00001-of-00002.safetensors in 0:03:56.
2025-02-05T02:56:58.813818135Z 2025-02-05T02:56:58.813699Z  INFO text_generation_launcher: Download: [1/2] -- ETA: 0:03:56
2025-02-05T02:56:58.814853538Z 2025-02-05T02:56:58.814787Z  INFO text_generation_launcher: Download file: model-00002-of-00002.safetensors
2025-02-05T02:58:47.104495512Z 2025-02-05T02:58:47.104309Z  INFO text_generation_launcher: Downloaded /data/hub/models--Intel--neural-chat-7b-v3-3/snapshots/7506dfc5fb325a8a8e0c4f9a6a001671833e5b8e/model-00002-of-00002.safetensors in 0:01:48.
2025-02-05T02:58:47.104561397Z 2025-02-05T02:58:47.104440Z  INFO text_generation_launcher: Download: [2/2] -- ETA: 0
2025-02-05T02:58:48.009382762Z 2025-02-05T02:58:48.009230Z  INFO download: text_generation_launcher: Successfully downloaded weights for Intel/neural-chat-7b-v3-3
2025-02-05T02:58:48.009882789Z 2025-02-05T02:58:48.009790Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2025-02-05T02:58:55.571062861Z 2025-02-05T02:58:55.570703Z  INFO text_generation_launcher: Using prefix caching = False
2025-02-05T02:58:55.571116146Z 2025-02-05T02:58:55.570761Z  INFO text_generation_launcher: Using Attention = default
2025-02-05T02:58:55.576805212Z 2025-02-05T02:58:55.576739Z  WARN text_generation_launcher: FBGEMM fp8 kernels are not installed.
2025-02-05T02:58:55.581570309Z 2025-02-05T02:58:55.581476Z  INFO text_generation_launcher: quantize=None
2025-02-05T02:58:55.581583128Z 2025-02-05T02:58:55.581535Z  INFO text_generation_launcher: CLI SHARDED = False DTYPE = bfloat16
2025-02-05T02:58:56.831160755Z 2025-02-05T02:58:56.830837Z  INFO text_generation_launcher: Server:server_inner: sharded =False
2025-02-05T02:58:56.831195798Z 2025-02-05T02:58:56.831111Z  INFO text_generation_launcher: Server:server_inner: data type = bfloat16, local_url = unix:///tmp/text-generation-server-0
2025-02-05T02:58:58.020699780Z 2025-02-05T02:58:58.020458Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2025-02-05T02:59:08.030531260Z 2025-02-05T02:59:08.030146Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2025-02-05T02:59:08.840568378Z 2025-02-05T02:59:08.840343Z  INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
2025-02-05T02:59:08.931505958Z 2025-02-05T02:59:08.931257Z  INFO shard-manager: text_generation_launcher: Shard ready in 20.918653892s rank=0
2025-02-05T02:59:11.027648160Z 2025-02-05T02:59:11.027413Z  INFO text_generation_launcher: Starting Webserver
2025-02-05T02:59:11.074223821Z 2025-02-05T02:59:11.073929Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:90: Warming up model
2025-02-05T02:59:11.568944815Z 2025-02-05T02:59:11.568595Z  INFO text_generation_launcher:
2025-02-05T02:59:11.568985725Z Following prefill warmup successfully.
2025-02-05T02:59:11.568995353Z Prefill batch size list:[2]
2025-02-05T02:59:11.569003321Z Prefill sequence length list:[256, 512, 768, 1024]
2025-02-05T02:59:11.569011145Z Memory stats: {'memory_allocated (GB)': 14.01, 'max_memory_allocated (GB)': 14.02, 'total_memory_available (GB)': 94.62}
2025-02-05T02:59:18.524141878Z 2025-02-05T02:59:18.523923Z  INFO text_generation_launcher:
2025-02-05T02:59:18.524194498Z Following decode warmup successfully.
2025-02-05T02:59:18.524203858Z Decode batch size list:[8]
2025-02-05T02:59:18.524211492Z Memory stats: {'memory_allocated (GB)': 16.38, 'max_memory_allocated (GB)': 18.1, 'total_memory_available (GB)': 94.62}
2025-02-05T02:59:18.524662228Z 2025-02-05T02:59:18.524479Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:103: Setting max batch total tokens to 16384
2025-02-05T02:59:18.524705175Z 2025-02-05T02:59:18.524535Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:130: Using backend V3
2025-02-05T02:59:18.524716667Z 2025-02-05T02:59:18.524592Z  INFO text_generation_router::server: router/src/server.rs:1523: Using the Hugging Face API
2025-02-05T02:59:18.524733139Z 2025-02-05T02:59:18.524624Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/data/token"
2025-02-05T02:59:19.108404553Z 2025-02-05T02:59:19.108008Z  INFO text_generation_router::server: router/src/server.rs:2247: Serving revision 7506dfc5fb325a8a8e0c4f9a6a001671833e5b8e of model Intel/neural-chat-7b-v3-3
2025-02-05T02:59:19.202978321Z 2025-02-05T02:59:19.202516Z  INFO text_generation_router::server: router/src/server.rs:1669: Using config Some(Mistral)
2025-02-05T02:59:19.203014613Z 2025-02-05T02:59:19.202548Z  WARN text_generation_router::server: router/src/server.rs:1816: Invalid hostname, defaulting to 0.0.0.0
2025-02-05T02:59:19.214263689Z 2025-02-05T02:59:19.213926Z  INFO text_generation_router::server: router/src/server.rs:2209: Connected

@xiguiw xiguiw added the invalid This doesn't seem right label Feb 5, 2025
@xiguiw xiguiw self-assigned this Feb 5, 2025
@xiguiw
Copy link
Collaborator

xiguiw commented Feb 5, 2025

In your file
audioqna_logs.err.txt

it shows whisper-service | INFO: Uvicorn running on http://0.0.0.0:7066 (Press CTRL+C to quit) started.

Attaching to whisper-service
whisper-service  | /home/user/.local/lib/python3.10/site-packages/pydantic/_internal/_fields.py:160: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
whisper-service  |
whisper-service  | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
whisper-service  |   warnings.warn(
whisper-service  | [WARNING|utils.py:212] 2025-01-27 00:52:53,718 >> optimum-habana v1.15.0 has been validated for SynapseAI v1.19.0 but habana-frameworks v1.18.0.524 was found, this could lead to undefined behavior!
whisper-service  | /usr/lib/python3.10/inspect.py:288: FutureWarning: `torch.distributed.reduce_op` is deprecated, please use `torch.distributed.ReduceOp` instead
whisper-service  |   return isinstance(object, types.FunctionType)
whisper-service  | /home/user/.local/lib/python3.10/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
whisper-service  |   warnings.warn(
whisper-service  | ============================= HABANA PT BRIDGE CONFIGURATION ===========================
whisper-service  |  PT_HPU_LAZY_MODE = 1
whisper-service  |  PT_RECIPE_CACHE_PATH =
whisper-service  |  PT_CACHE_FOLDER_DELETE = 0
whisper-service  |  PT_HPU_RECIPE_CACHE_CONFIG =
whisper-service  |  PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
whisper-service  |  PT_HPU_LAZY_ACC_PAR_MODE = 1
whisper-service  |  PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
whisper-service  |  PT_HPU_EAGER_PIPELINE_ENABLE = 1
whisper-service  |  PT_HPU_EAGER_COLLECTIVE_PIPELINE_ENABLE = 1
whisper-service  | ---------------------------: System Configuration :---------------------------
whisper-service  | Num CPU Cores : 160
whisper-service  | CPU RAM       : 1056374420 KB
whisper-service  | ------------------------------------------------------------------------------
whisper-service  | You have passed language=english, but also have set `forced_decoder_ids` to [[1, None], [2, 50359]] which creates a conflict. `forced_decoder_ids` will be ignored in favor of language=english.
whisper-service  | The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
whisper-service  | [WARNING|logging.py:328] 2025-01-27 00:53:16,996 >> Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.43.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
whisper-service  | INFO:     Started server process [1]
whisper-service  | INFO:     Waiting for application startup.
whisper-service  | INFO:     Application startup complete.
whisper-service  | INFO:     Uvicorn running on http://0.0.0.0:7066 (Press CTRL+C to quit)
Gracefully stopping... (press Ctrl+C again to force)
[+] Stopping 1/1
 ✔ Container whisper-service  Stopped                                                                                                                                                                                     2.5s

@xiguiw
Copy link
Collaborator

xiguiw commented Feb 5, 2025

Hardware type

Xeon-GNR

You list hardware as Xeon-GNR, but Gaudi in your log.

Please be noted, the dockage images require Gaudi driver hl-1.19.0

+-----------------------------------------------------------------------------+
| HL-SMI Version:                              hl-1.19.0-fw-57.1.0.0          |
| Driver Version:                                     1.19.0-2427ed8          |
|-------------------------------+----------------------+----------------------+
| AIP  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncor-Events|
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | AIP-Util  Compute M. |
|===============================+======================+======================|

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

3 participants