Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal instruction (core dumped) #15

Open
dbzoo opened this issue Oct 26, 2023 · 3 comments
Open

Illegal instruction (core dumped) #15

dbzoo opened this issue Oct 26, 2023 · 3 comments

Comments

@dbzoo
Copy link

dbzoo commented Oct 26, 2023

I presume there is a minimum CPU requirement like needing AVX2, AVX-512, FP16C or something?
Could you document the minimum instruction set and extensions required.

root@1d1c4289f303:/llm-api# python app/main.py
2023-10-26 23:31:19,237 - INFO - llama - found an existing model /models/llama_601507219781/ggml-model-q4_0.bin
2023-10-26 23:31:19,237 - INFO - llama - setup done successfully for /models/llama_601507219781/ggml-model-q4_0.bin
Illegal instruction (core dumped)
root@1d1c4289f303:/llm-api#

--- modulename: llama, funcname: init
llama.py(289): self.verbose = verbose
llama.py(291): self.numa = numa
llama.py(292): if not Llama.__backend_initialized:
llama.py(293): if self.verbose:
llama.py(294): llama_cpp.llama_backend_init(self.numa)
--- modulename: llama_cpp, funcname: llama_backend_init
llama_cpp.py(475): return _lib.llama_backend_init(numa)
Illegal instruction (core dumped)

I assume this has CPU requirements.
ENV CMAKE_ARGS "-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"

OpenBLAS can be built for multiple targets with runtime detection of the target cpu by specifiying DYNAMIC_ARCH=1 in Makefile.rule, on the gmake command line or as -DDYNAMIC_ARCH=TRUE in cmake.

https://github.com/OpenMathLib/OpenBLAS/blob/develop/README.md

@1b5d
Copy link
Owner

1b5d commented Nov 12, 2023

Could you please share the configs you are using for this model?

@1b5d
Copy link
Owner

1b5d commented Nov 12, 2023

Btw I just built different images for different BLAS backends:

  • OpenBLAS: 1b5d/llm-api:latest-openblas
  • cuBLAS: 1b5d/llm-api:latest-cublas
  • CLBlast: 1b5d/llm-api:latest-clblast
  • hipBLAS: 1b5d/llm-api:latest-hipblas

Could you please let me know if that helps?

@dbzoo
Copy link
Author

dbzoo commented Nov 14, 2023

I tried those images, and all still resulted in the illegal instruction. Thanks for the extra images to test with. If I can find some cycles, I will clone the repo, investigate, and submit a push to correct it. I believe the issue is that openblas is not compiled with runtime CPU detection and assumes the instructions available in the CPU. With a modern CPU this isn't an issue, but I'm using an old CPU, my fault; it's what I have - no AVX flag which I think is the culprit.

$ cat /proc/cpuinfo
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes hypervisor lahf_lm pti ssbd ibrs ibpb stibp tsc_adjust arat flush_l1d arch_capabilities

For the record: config

models_dir: /models
model_family: llama
setup_params:
  repo_id: botato/point-alpaca-ggml-model-q4_0
  filename: ggml-model-q4_0.bin
model_params:
  n_ctx: 512
  n_parts: -1
  n_gpu_layers: 0
  seed: -1
  use_mmap: True
  n_threads: 8
  n_batch: 2048
  last_n_tokens_size: 64
  lora_base: null
  lora_path: null
  low_vram: False
  tensor_split: null
  rope_freq_base: 10000.0
  rope_freq_scale: 1.0
  verbose: True`

What worked for me

If I start up a container using the image

docker run -it -v $PWD/models/:/models:rw -v $PWD/config/config.yaml:/llm-api/config.yaml:ro -p 8084:8000 --ulimit memlock=16000000000 1b5d/llm-api bash

At the shell prompt in the docker container.

$ FORCE_CMAKE=1 pip install --upgrade --no-deps --no-cache-dir --force-reinstall llama-cpp-python

Adjust this in the config and use a different model as the latest llama-cpp-python doesn't use GGML but GGUF

setup_params:
  repo_id: TheBloke/Llama-2-7B-GGUF
  filename: llama-2-7b.Q4_0.gguf

Then it starts without enabling BLAS - it's slow, but it works.

root@4784ec49bac1:/llm-api# python app/main.py
2023-11-15 02:27:54,593 - INFO - llama - found an existing model /models/llama_326323164690/llama-2-7b.Q4_0.gguf
2023-11-15 02:27:54,594 - INFO - llama - setup done successfully for /models/llama_326323164690/llama-2-7b.Q4_0.gguf
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from /models/llama_326323164690/llama-2-7b.Q4_0.gguf (version GGUF V2)
....
llama_build_graph: non-view tensors processed: 740/740
llama_new_context_with_model: compute buffer total size = 72.06 MB
AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
2023-11-15 02:17:35,436 - INFO - server - Started server process [219]
2023-11-15 02:17:35,436 - INFO - on - Waiting for application startup.
2023-11-15 02:27:55,563 - INFO - on - Application startup complete.
2023-11-15 02:27:55,564 - INFO - server - Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

That might give you some clues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants