[Inference Error] The onnx inference result is inconsistent with the numpy inference result #23202

songqiuyu · 2024-12-26T14:22:50Z

Describe the issue

I want to implement the inference of onnx model in my own C code，but in some layers，the result between C and ONNX has 1 error, such as C is 40 but onnx is 41.

I want to know why numpy's result is -87 but onnx is -88 ? ?
In Quant model inference, an error of 1 is fatal！The cumulative error through many layers can reach 4-5 (in 8-bit integers)
Thank u :>

the test code below⬇

To reproduce

import onnx
from onnx import helper, TensorProto, numpy_helper
import numpy as np
import onnxruntime as ort

A = 'A'
B = 'B'
C = 'C'


A_scale = 0.008010663092136383
A_zero_point = 7
B_scale = 0.00622713053599
B_zero_point = -128
C_scale = 0.006873490754514933
C_zero_point = -128


input_A = helper.make_tensor_value_info(A, TensorProto.INT8, [1, 1, 1, 1])
input_B = helper.make_tensor_value_info(B, TensorProto.INT8, [1, 1, 1, 1])


output = helper.make_tensor_value_info(C, TensorProto.INT8, [1, 1, 1, 1])


initializer_A_scale = numpy_helper.from_array(np.array(A_scale, dtype=np.float32), name='A_scale')
initializer_A_zero_point = numpy_helper.from_array(np.array(A_zero_point, dtype=np.int8), name='A_zero_point')

initializer_B_scale = numpy_helper.from_array(np.array(B_scale, dtype=np.float32), name='B_scale')
initializer_B_zero_point = numpy_helper.from_array(np.array(B_zero_point, dtype=np.int8), name='B_zero_point')

initializer_C_scale = numpy_helper.from_array(np.array(C_scale, dtype=np.float32), name='C_scale')
initializer_C_zero_point = numpy_helper.from_array(np.array(C_zero_point, dtype=np.int8), name='C_zero_point')



qlinear_add_node = helper.make_node(
    'QLinearAdd',
    inputs=[A, 'A_scale', 'A_zero_point', B, 'B_scale', 'B_zero_point', 'C_scale', 'C_zero_point'],
    outputs=[C],
    name='QLinearAdd',
     domain='com.microsoft' 
)
opset_version_ai_onnx = 13  
opset_version_com_microsoft = 1  

graph = helper.make_graph(
    nodes=[qlinear_add_node],
    name='QLinearAdd_Graph',
    inputs=[input_A, input_B],
    outputs=[output],
    initializer=[
        initializer_A_scale,
        initializer_A_zero_point,
        initializer_B_scale,
        initializer_B_zero_point,
        initializer_C_scale,
        initializer_C_zero_point
    ]
)


model = helper.make_model(graph, producer_name='onnx-qlinearadd-fixed-params', 
                          opset_imports=[ helper.make_opsetid(domain='ai.onnx', version=opset_version_ai_onnx),
        helper.make_opsetid(domain='com.microsoft', version=opset_version_com_microsoft)])
onnx.save(model, 'qlinearadd_fixed_params_model.onnx')
print("ONNX MODEL save 'qlinearadd_fixed_params_model.onnx'")


A_int8 = np.array([-8], dtype=np.int8)
B_int8 = np.array([-64], dtype=np.int8)


A_real = A_scale * (A_int8.astype(np.int32) - A_zero_point)
B_real = B_scale * (B_int8.astype(np.int32) - B_zero_point)


C_real = A_real + B_real

A1 = A_scale *(A_int8 - A_zero_point)
B1 = B_scale*(B_int8 - B_zero_point)

print((A1+B1) / C_scale + C_zero_point )

C_int32 = np.round(C_real / C_scale) + C_zero_point
C_int8 = C_int32.astype(np.int8)
print(C_int8)
session = ort.InferenceSession('qlinearadd_fixed_params_model.onnx')


output_name = session.get_outputs()[0].name

A_data = np.array([-8], dtype=np.int8).reshape([1, 1, 1, 1])
B_data = np.array([-64], dtype=np.int8).reshape([1, 1, 1, 1])


input_dict = {
    'A': A_data,
    'B': B_data
}


outputs = session.run([output_name], input_dict)


C_output = outputs[0]
print("output C:", C_output)

Urgency

No response

Platform

Windows

OS Version

11

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

onnxruntime==1.19.2 python

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

songqiuyu · 2024-12-26T14:24:50Z

the program's result is:

ONNX MODEL save 'qlinearadd_fixed_params_model.onnx' 
[-87.49999529]
[-87]
output C: [[[[-88]]]]

snnn added the quantization issues related to quantization label Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inference Error] The onnx inference result is inconsistent with the numpy inference result #23202

[Inference Error] The onnx inference result is inconsistent with the numpy inference result #23202

songqiuyu commented Dec 26, 2024

songqiuyu commented Dec 26, 2024

[Inference Error] The onnx inference result is inconsistent with the numpy inference result #23202

[Inference Error] The onnx inference result is inconsistent with the numpy inference result #23202

Comments

songqiuyu commented Dec 26, 2024

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

songqiuyu commented Dec 26, 2024