olive auto-opt --num-splits fails with FileNotFoundError: [Errno 2] No such file or directory: '...../model.onnx' #1599

fujikosu · 2025-02-06T15:08:11Z

Describe the bug

When I run olive auto-opt command with --num-splits, it fails with an error like FileNotFoundError: [Errno 2] No such file or directory: '/mnt/azureml/cr/j/b921dc0050d344799640a1d09a4bcabd/exe/wd/.olive-cache/default_workflow/runs/8af4295b/models/model.onnx' and onnx files are not produced.

When I run the same command without --num-splits, it works end to end without any erros and onnx files are produced correctly

To Reproduce

Run this olive command below

      olive auto-opt \
        --model_name_or_path outputs/pytorch_awq_dir \
        --output_path outputs/onnx_models \
        --precision int4 \
        --device cpu \
        --provider CPUExecutionProvider \
        --num-splits 2 \
        --use_ort_genai \
        --log_level 1

For model_name_or_path, I pass phi3.5 mini instruct model quantized by AWQ I generated like below

      olive quantize \
        --model_name_or_path fine_tuned_phi_path \
        --data_files train_jsonl_path \
        --algorithm awq \
        --precision int4 \
        --output_path outputs/pytorch_awq_dir \
        --log_level 1

Expected behavior

olive auto-opt --num-splits works end to end without any errors and onnx files are produced as intended

Olive config
Add Olive configurations here.

Olive logs

Loaded previous command output of type hfmodel from outputs/pytorch_awq_dir
[2025-02-06 10:41:31,784] [INFO] [run.py:138:run_engine] Running workflow default_workflow
[2025-02-06 10:41:31,814] [INFO] [cache.py:138:__init__] Using cache directory: /mnt/azureml/cr/j/b921dc0050d344799640a1d09a4bcabd/exe/wd/.olive-cache/default_workflow
[2025-02-06 10:41:31,822] [INFO] [accelerator_creator.py:224:create_accelerators] Running workflow on accelerator specs: cpu-cpu
[2025-02-06 10:41:31,824] [INFO] [engine.py:246:run] Running Olive on accelerator: cpu-cpu
[2025-02-06 10:41:31,824] [INFO] [engine.py:888:_create_system] Creating target system ...
[2025-02-06 10:41:31,824] [INFO] [engine.py:891:_create_system] Target system created in 0.000112 seconds
[2025-02-06 10:41:31,824] [INFO] [engine.py:902:_create_system] Creating host system ...
[2025-02-06 10:41:31,824] [INFO] [engine.py:905:_create_system] Host system created in 0.000087 seconds
[2025-02-06 10:41:33,537] [INFO] [engine.py:709:_run_pass] Running pass capture_split_info:CaptureSplitInfo {}
You have loaded an AWQ model on CPU and have a CUDA device available, make sure to set your model on a GPU device in order to run your model.
`low_cpu_mem_usage` was None, now set to True since model is quantized.
[2025-02-06 10:41:35,299] [INFO] [engine.py:781:_run_pass] Pass capture_split_info:CaptureSplitInfo finished in 1.762350 seconds
[2025-02-06 10:41:35,300] [INFO] [engine.py:709:_run_pass] Running pass conversion:OnnxConversion {}
We suggest you to set `torch_dtype=torch.float16` for better efficiency with AWQ.
`low_cpu_mem_usage` was None, now set to True since model is quantized.
We suggest you to set `torch_dtype=torch.float16` for better efficiency with AWQ.
`low_cpu_mem_usage` was None, now set to True since model is quantized.
We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
/home/devuser/.local/lib/python3.10/site-packages/transformers/models/phi3/modeling_phi3.py:96: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if sequence_length != 1:
You are not running the flash-attention implementation, expect numerical differences.
/home/devuser/.local/lib/python3.10/site-packages/transformers/models/phi3/modeling_phi3.py:260: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if seq_len > self.original_max_position_embeddings:
/home/devuser/.local/lib/python3.10/site-packages/transformers/models/phi3/modeling_phi3.py:263: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  ext_factors = torch.tensor(self.short_factor, dtype=torch.float32, device=x.device)
/home/devuser/.local/lib/python3.10/site-packages/transformers/models/phi3/modeling_phi3.py:466: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
[2025-02-06 10:41:50,374] [INFO] [engine.py:781:_run_pass] Pass conversion:OnnxConversion finished in 15.074021 seconds
[2025-02-06 10:41:50,377] [INFO] [engine.py:709:_run_pass] Running pass genai_config_only:ModelBuilder {}
GroupQueryAttention (GQA) is used in this model.
Saving GenAI config in /mnt/azureml/cr/j/b921dc0050d344799640a1d09a4bcabd/exe/wd/.olive-cache/default_workflow/runs/8af4295b/models
Saving processing files in /mnt/azureml/cr/j/b921dc0050d344799640a1d09a4bcabd/exe/wd/.olive-cache/default_workflow/runs/8af4295b/models for GenAI
[2025-02-06 10:41:50,504] [ERROR] [engine.py:776:_run_pass] Pass run failed.
Traceback (most recent call last):
  File "/home/devuser/.local/lib/python3.10/site-packages/olive/engine/engine.py", line 764, in _run_pass
    output_model_config = host.run_pass(p, input_model_config, output_model_path, pass_search_point)
  File "/home/devuser/.local/lib/python3.10/site-packages/olive/systems/local.py", line 30, in run_pass
    output_model = the_pass.run(model, output_model_path, point)
  File "/home/devuser/.local/lib/python3.10/site-packages/olive/passes/olive_pass.py", line 245, in run
    output_model = self._run_for_config(model, config, output_model_path)
  File "/home/devuser/.local/lib/python3.10/site-packages/olive/passes/onnx/model_builder.py", line 205, in _run_for_config
    model_proto = onnx.load(output_model_filepath, load_external_data=False)
  File "/home/devuser/.local/lib/python3.10/site-packages/onnx/__init__.py", line 212, in load_model
    model = _get_serializer(format, f).deserialize_proto(_load_bytes(f), ModelProto())
  File "/home/devuser/.local/lib/python3.10/site-packages/onnx/__init__.py", line 149, in _load_bytes
    with open(f, "rb") as readable:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/azureml/cr/j/b921dc0050d344799640a1d09a4bcabd/exe/wd/.olive-cache/default_workflow/runs/8af4295b/models/model.onnx'
[2025-02-06 10:41:50,504] [WARNING] [engine.py:334:run_accelerator] Failed to run Olive on cpu-cpu.
Traceback (most recent call last):
  File "/home/devuser/.local/lib/python3.10/site-packages/olive/engine/engine.py", line 330, in run_accelerator
    output_footprint = self.run_no_search(input_model_config, input_model_id, accelerator_spec, output_dir)
  File "/home/devuser/.local/lib/python3.10/site-packages/olive/engine/engine.py", line 400, in run_no_search
    should_prune, signal, model_ids = self._run_passes(
  File "/home/devuser/.local/lib/python3.10/site-packages/olive/engine/engine.py", line 664, in _run_passes
    model_config, model_id = self._run_pass(
  File "/home/devuser/.local/lib/python3.10/site-packages/olive/engine/engine.py", line 764, in _run_pass
    output_model_config = host.run_pass(p, input_model_config, output_model_path, pass_search_point)
  File "/home/devuser/.local/lib/python3.10/site-packages/olive/systems/local.py", line 30, in run_pass
    output_model = the_pass.run(model, output_model_path, point)
  File "/home/devuser/.local/lib/python3.10/site-packages/olive/passes/olive_pass.py", line 245, in run
    output_model = self._run_for_config(model, config, output_model_path)
  File "/home/devuser/.local/lib/python3.10/site-packages/olive/passes/onnx/model_builder.py", line 205, in _run_for_config
    model_proto = onnx.load(output_model_filepath, load_external_data=False)
  File "/home/devuser/.local/lib/python3.10/site-packages/onnx/__init__.py", line 212, in load_model
    model = _get_serializer(format, f).deserialize_proto(_load_bytes(f), ModelProto())
  File "/home/devuser/.local/lib/python3.10/site-packages/onnx/__init__.py", line 149, in _load_bytes
    with open(f, "rb") as readable:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/azureml/cr/j/b921dc0050d344799640a1d09a4bcabd/exe/wd/.olive-cache/default_workflow/runs/8af4295b/models/model.onnx'
[2025-02-06 10:41:50,505] [INFO] [engine.py:265:run] Run history for cpu-cpu:
[2025-02-06 10:41:50,509] [INFO] [engine.py:517:dump_run_history] run history:
+------------+-------------------+------------------+----------------+-----------+
| model_id   | parent_model_id   | from_pass        |   duration_sec | metrics   |
+============+===================+==================+================+===========+
| 5e581aa1   |                   |                  |                |           |
+------------+-------------------+------------------+----------------+-----------+
| 35e1839b   | 5e581aa1          | CaptureSplitInfo |        1.76235 |           |
+------------+-------------------+------------------+----------------+-----------+
| 237c7d40   | 35e1839b          | OnnxConversion   |       15.074   |           |
+------------+-------------------+------------------+----------------+-----------+
Command failed. Please set the log_level to 1 for more detailed logs.

Other information

OS: Linux
Olive version: olive-ai[ort-genai,auto-opt]==0.7.1.1
ONNXRuntime package and version: onnxruntime-genai==0.5.2, onnxruntime-genai-cuda==0.5.2
Transformers package version: transformers==4.44.2
this job is running on AzureML pipeline

Additional context
To resolve this issue #1595, I was initially testing it with QNN mode

        --device npu \
        --provider QNNExecutionProvider \

and it didn't work so I reverted back to cpu and it still didn't work. But the purpose of introducing num_splits is to eventually solve #1595

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

olive auto-opt --num-splits fails with FileNotFoundError: [Errno 2] No such file or directory: '...../model.onnx' #1599

olive auto-opt --num-splits fails with FileNotFoundError: [Errno 2] No such file or directory: '...../model.onnx' #1599

fujikosu commented Feb 6, 2025

olive auto-opt --num-splits fails with FileNotFoundError: [Errno 2] No such file or directory: '...../model.onnx' #1599

olive auto-opt --num-splits fails with FileNotFoundError: [Errno 2] No such file or directory: '...../model.onnx' #1599

Comments

fujikosu commented Feb 6, 2025