You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I run olive auto-opt command with --num-splits, it fails with an error like FileNotFoundError: [Errno 2] No such file or directory: '/mnt/azureml/cr/j/b921dc0050d344799640a1d09a4bcabd/exe/wd/.olive-cache/default_workflow/runs/8af4295b/models/model.onnx' and onnx files are not produced.
When I run the same command without--num-splits, it works end to end without any erros and onnx files are produced correctly
olive auto-opt --num-splits works end to end without any errors and onnx files are produced as intended
Olive config
Add Olive configurations here.
Olive logs
Loaded previous command output of type hfmodel from outputs/pytorch_awq_dir
[2025-02-06 10:41:31,784] [INFO] [run.py:138:run_engine] Running workflow default_workflow
[2025-02-06 10:41:31,814] [INFO] [cache.py:138:__init__] Using cache directory: /mnt/azureml/cr/j/b921dc0050d344799640a1d09a4bcabd/exe/wd/.olive-cache/default_workflow
[2025-02-06 10:41:31,822] [INFO] [accelerator_creator.py:224:create_accelerators] Running workflow on accelerator specs: cpu-cpu
[2025-02-06 10:41:31,824] [INFO] [engine.py:246:run] Running Olive on accelerator: cpu-cpu
[2025-02-06 10:41:31,824] [INFO] [engine.py:888:_create_system] Creating target system ...
[2025-02-06 10:41:31,824] [INFO] [engine.py:891:_create_system] Target system created in 0.000112 seconds
[2025-02-06 10:41:31,824] [INFO] [engine.py:902:_create_system] Creating host system ...
[2025-02-06 10:41:31,824] [INFO] [engine.py:905:_create_system] Host system created in 0.000087 seconds
[2025-02-06 10:41:33,537] [INFO] [engine.py:709:_run_pass] Running pass capture_split_info:CaptureSplitInfo {}
You have loaded an AWQ model on CPU and have a CUDA device available, make sure to set your model on a GPU device in order to run your model.
`low_cpu_mem_usage` was None, now set to True since model is quantized.
[2025-02-06 10:41:35,299] [INFO] [engine.py:781:_run_pass] Pass capture_split_info:CaptureSplitInfo finished in 1.762350 seconds
[2025-02-06 10:41:35,300] [INFO] [engine.py:709:_run_pass] Running pass conversion:OnnxConversion {}
We suggest you to set `torch_dtype=torch.float16` for better efficiency with AWQ.
`low_cpu_mem_usage` was None, now set to True since model is quantized.
We suggest you to set `torch_dtype=torch.float16` for better efficiency with AWQ.
`low_cpu_mem_usage` was None, now set to True since model is quantized.
We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
/home/devuser/.local/lib/python3.10/site-packages/transformers/models/phi3/modeling_phi3.py:96: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if sequence_length != 1:
You are not running the flash-attention implementation, expect numerical differences.
/home/devuser/.local/lib/python3.10/site-packages/transformers/models/phi3/modeling_phi3.py:260: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_len > self.original_max_position_embeddings:
/home/devuser/.local/lib/python3.10/site-packages/transformers/models/phi3/modeling_phi3.py:263: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
ext_factors = torch.tensor(self.short_factor, dtype=torch.float32, device=x.device)
/home/devuser/.local/lib/python3.10/site-packages/transformers/models/phi3/modeling_phi3.py:466: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
[2025-02-06 10:41:50,374] [INFO] [engine.py:781:_run_pass] Pass conversion:OnnxConversion finished in 15.074021 seconds
[2025-02-06 10:41:50,377] [INFO] [engine.py:709:_run_pass] Running pass genai_config_only:ModelBuilder {}
GroupQueryAttention (GQA) is used in this model.
Saving GenAI config in /mnt/azureml/cr/j/b921dc0050d344799640a1d09a4bcabd/exe/wd/.olive-cache/default_workflow/runs/8af4295b/models
Saving processing files in /mnt/azureml/cr/j/b921dc0050d344799640a1d09a4bcabd/exe/wd/.olive-cache/default_workflow/runs/8af4295b/models for GenAI
[2025-02-06 10:41:50,504] [ERROR] [engine.py:776:_run_pass] Pass run failed.
Traceback (most recent call last):
File "/home/devuser/.local/lib/python3.10/site-packages/olive/engine/engine.py", line 764, in _run_pass
output_model_config = host.run_pass(p, input_model_config, output_model_path, pass_search_point)
File "/home/devuser/.local/lib/python3.10/site-packages/olive/systems/local.py", line 30, in run_pass
output_model = the_pass.run(model, output_model_path, point)
File "/home/devuser/.local/lib/python3.10/site-packages/olive/passes/olive_pass.py", line 245, in run
output_model = self._run_for_config(model, config, output_model_path)
File "/home/devuser/.local/lib/python3.10/site-packages/olive/passes/onnx/model_builder.py", line 205, in _run_for_config
model_proto = onnx.load(output_model_filepath, load_external_data=False)
File "/home/devuser/.local/lib/python3.10/site-packages/onnx/__init__.py", line 212, in load_model
model = _get_serializer(format, f).deserialize_proto(_load_bytes(f), ModelProto())
File "/home/devuser/.local/lib/python3.10/site-packages/onnx/__init__.py", line 149, in _load_bytes
with open(f, "rb") as readable:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/azureml/cr/j/b921dc0050d344799640a1d09a4bcabd/exe/wd/.olive-cache/default_workflow/runs/8af4295b/models/model.onnx'
[2025-02-06 10:41:50,504] [WARNING] [engine.py:334:run_accelerator] Failed to run Olive on cpu-cpu.
Traceback (most recent call last):
File "/home/devuser/.local/lib/python3.10/site-packages/olive/engine/engine.py", line 330, in run_accelerator
output_footprint = self.run_no_search(input_model_config, input_model_id, accelerator_spec, output_dir)
File "/home/devuser/.local/lib/python3.10/site-packages/olive/engine/engine.py", line 400, in run_no_search
should_prune, signal, model_ids = self._run_passes(
File "/home/devuser/.local/lib/python3.10/site-packages/olive/engine/engine.py", line 664, in _run_passes
model_config, model_id = self._run_pass(
File "/home/devuser/.local/lib/python3.10/site-packages/olive/engine/engine.py", line 764, in _run_pass
output_model_config = host.run_pass(p, input_model_config, output_model_path, pass_search_point)
File "/home/devuser/.local/lib/python3.10/site-packages/olive/systems/local.py", line 30, in run_pass
output_model = the_pass.run(model, output_model_path, point)
File "/home/devuser/.local/lib/python3.10/site-packages/olive/passes/olive_pass.py", line 245, in run
output_model = self._run_for_config(model, config, output_model_path)
File "/home/devuser/.local/lib/python3.10/site-packages/olive/passes/onnx/model_builder.py", line 205, in _run_for_config
model_proto = onnx.load(output_model_filepath, load_external_data=False)
File "/home/devuser/.local/lib/python3.10/site-packages/onnx/__init__.py", line 212, in load_model
model = _get_serializer(format, f).deserialize_proto(_load_bytes(f), ModelProto())
File "/home/devuser/.local/lib/python3.10/site-packages/onnx/__init__.py", line 149, in _load_bytes
with open(f, "rb") as readable:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/azureml/cr/j/b921dc0050d344799640a1d09a4bcabd/exe/wd/.olive-cache/default_workflow/runs/8af4295b/models/model.onnx'
[2025-02-06 10:41:50,505] [INFO] [engine.py:265:run] Run history for cpu-cpu:
[2025-02-06 10:41:50,509] [INFO] [engine.py:517:dump_run_history] run history:
+------------+-------------------+------------------+----------------+-----------+
| model_id | parent_model_id | from_pass | duration_sec | metrics |
+============+===================+==================+================+===========+
| 5e581aa1 | | | | |
+------------+-------------------+------------------+----------------+-----------+
| 35e1839b | 5e581aa1 | CaptureSplitInfo | 1.76235 | |
+------------+-------------------+------------------+----------------+-----------+
| 237c7d40 | 35e1839b | OnnxConversion | 15.074 | |
+------------+-------------------+------------------+----------------+-----------+
Command failed. Please set the log_level to 1 for more detailed logs.
Describe the bug
When I run
olive auto-opt
command with--num-splits
, it fails with an error likeFileNotFoundError: [Errno 2] No such file or directory: '/mnt/azureml/cr/j/b921dc0050d344799640a1d09a4bcabd/exe/wd/.olive-cache/default_workflow/runs/8af4295b/models/model.onnx'
and onnx files are not produced.When I run the same command without
--num-splits
, it works end to end without any erros and onnx files are produced correctlyTo Reproduce
Run this olive command below
For model_name_or_path, I pass phi3.5 mini instruct model quantized by AWQ I generated like below
Expected behavior
olive auto-opt --num-splits works end to end without any errors and onnx files are produced as intended
Olive config
Add Olive configurations here.
Olive logs
Other information
Additional context
To resolve this issue #1595, I was initially testing it with QNN mode
and it didn't work so I reverted back to cpu and it still didn't work. But the purpose of introducing num_splits is to eventually solve #1595
The text was updated successfully, but these errors were encountered: