-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DML] Olive generated adapters not working with OrtGenAi #1544
Comments
Hi, thanks for opening this issue. The lora workflow with DML has not been fully tested. From the olive side, we only verified the example with CPU and CUDA EP. I see that you opened a related issue on the onnxruntime-genai repo which I think is a good idea. With regards to the aside
|
Thank you for your response. I realise now that it's not an issue with olive (I have managed to avoid the quality degradation mentioned above) nor onnxruntime genai but with onnxruntimes DML + Adapters API interoperability. It's not that it isn't tested it seems that they know it does work but just haven't said anything |
I am using the method of creating adapters depicted here which I have got to work when using the CPU EP, however when using DML I get the following error when calling
adapters.LoadAdapter
:Unhandled exception. System.Exception: D:\a\_work\1\s\onnxruntime\core\session\lora_adapters.cc:94 onnxruntime::lora::LoraAdapter::InitializeParamsValues Data transfer is not available for the specified device allocator, it also must not be a CPU allocator
I have tested the
olive auto-opt
call both with and without the--use_model_builder
option but they both get the same result. I have also tried using theconvert-adapters
olive call instead but the resulting adapters do not work with CPU EP either (see aside).If I run the model without the adapter on CPU EP it runs fine as well, whereas when I run the model without the adapter on DML I get the following error when calling
AppendTokenSequences
:Unhandled exception. System.Exception: Non-zero status code returned while running DmlFusedNode_0_5 node. Name:'DmlFusedNode_0_5' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2839)\onnxruntime.dll!00007FFE495DF44C: (caller: 00007FFE495EEEC9) Exception(1) tid(2bb4) 80070057 The parameter is incorrect.
The same does not happen when using ORTGenAi's
model_builder.py
and passing in an adapter path, but then you cannot use multiple LoRA weights as it is tied into the onnx model permanently.OS: Windows 11 x64
GPU: RTX 4090
API: C#
MODEL: Qwen/Qwen2.5-1.5B
(Aside) The adapters (when used via CPU EP) appear to have significant quality degradation. I can see that
convert-adapters
does lora scaling (alpha/rank) but I cannot find whether theauto-opt
call is doing the same. Creating adapters viaconvert-adapters
does not work with CPU EP either as the keys are not being renamed appropriately getting an invalid key/name/parameter error (.layers.0.self_attn.
rather than.layers.0.attn.
).The text was updated successfully, but these errors were encountered: