Replies: 2 comments
-
Hi @yufenglee, could you please help take a look? |
Beta Was this translation helpful? Give feedback.
-
Facing the same issue Example 1:Input = ['best hotel in bay area']
Example 2:Input = ['best hotel in bay area',"best hotel in bay"]
The output from the quantized model changes when batch size > 1
|
Beta Was this translation helpful? Give feedback.
-
Describe the bug
I generated a int8 quantized gpt2 model by following the instruction in this notebook (https://github.com/microsoft/onnxruntime/blob/1ce2982f65e5516067fdcaef19409279173b0d75/onnxruntime/python/tools/transformers/notebooks/Inference_GPT2_with_OnnxRuntime_on_CPU.ipynb)
When I test the quantized model with batch input, some unexpected results were found.
These are two input I tested:
['best hotel in bay area', 'best hotel in bay area'] ( [[13466, 7541, 287, 15489, 1989], [13466, 7541, 287, 15489, 1989]] after tokenize)
['best hotel in bayale', 'best hotel in bay area'] ( [[13466, 7541, 287, 15489, 1000], [13466, 7541, 287, 15489, 1989]] after tokenize)
sample 2 in both batch input are the same, I just changed the last token of sample 1.
after inputing them into quantized gpt2 model, the output of sample 2 are different...
I was expecting same output results from sample 2 in these two input tests.
System information
To Reproduce
I tried with two input
and debug the result by checking
after the first step, this 'next_token_logits' output of second sample in test 1 and test 2 are different.
I also check if only one sample input
the result is the same with test 1, but different with test 2.
Expected behavior
I was expecting the same output of the second sample in test 1 and test 2, as their input are the same.
Additional context
In addition, I also try this script with the onnx gpt2 model with quantization, the results are as expected:
the 'next_token_logits' output of second sample of test 1 and test 2 are the same.
Beta Was this translation helpful? Give feedback.
All reactions