You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
when i run kldidiai step by step 'https://github.com/ARM-software/ML-examples/tree/main/kleidiai-examples/llama_cpp', then i meeet error like this KleidiAI: CPU features: -- neon: yes -- dotprod: yes -- i8mm: yes -- sme: no llm_load_tensors: CPU model buffer size = 3647.87 MiB .................................................................................................. llama_new_context_with_model: n_seq_max = 1 llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: n_ctx_per_seq = 2048 llama_new_context_with_model: n_batch = 2048 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_new_context_with_model: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized llama_kv_cache_init: CPU KV buffer size = 1024.00 MiB llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB llama_new_context_with_model: CPU output buffer size = 0.12 MiB llama_new_context_with_model: CPU compute buffer size = 164.01 MiB llama_new_context_with_model: graph nodes = 1030 llama_new_context_with_model: graph splits = 1 common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable) Illegal instruction
The text was updated successfully, but these errors were encountered:
Same here. Followed the instructions for applying the patch on top of llama.cpp commit b8deef0ec0af5febac1d2cfd9119ff330ed0b762, and built natively. Get "Illegal instruction (core dumped)" in llama-cli,...
Hardware: Surface Laptop 7 15", Snadragon X Elite, 16GB RAM
Software: Windows 11 24H2 (Build 26100.2605), WSL2/Ubuntu 24.04 LTS, patched llama.cpp version: 4034 (b8deef0e) built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for aarch64-linux-gnu
On the same HW and WSL2/Linux/gcc, the current version of llama.cpp (version: 4361 (7585edbd)) runs perfectly fine.
when i run kldidiai step by step 'https://github.com/ARM-software/ML-examples/tree/main/kleidiai-examples/llama_cpp', then i meeet error like this
KleidiAI: CPU features: -- neon: yes -- dotprod: yes -- i8mm: yes -- sme: no llm_load_tensors: CPU model buffer size = 3647.87 MiB .................................................................................................. llama_new_context_with_model: n_seq_max = 1 llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: n_ctx_per_seq = 2048 llama_new_context_with_model: n_batch = 2048 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_new_context_with_model: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized llama_kv_cache_init: CPU KV buffer size = 1024.00 MiB llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB llama_new_context_with_model: CPU output buffer size = 0.12 MiB llama_new_context_with_model: CPU compute buffer size = 164.01 MiB llama_new_context_with_model: graph nodes = 1030 llama_new_context_with_model: graph splits = 1 common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable) Illegal instruction
The text was updated successfully, but these errors were encountered: