Skip to content

Latest commit

 

History

History

calibration

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

We call the setDynamicRange or calibration table with name implicit int8 precision,
and the Q/DQ generated by pytorch-quantization explicit int8 precision.
The difference is that for implicit, TRT try to optimize the best performance, for explicit, TRT need guarantee the same accuracy as in the original framework while optimize the performance. So for explicit, we have rules how to propagate the Q/DQ nodes and fuse them with other nodes. put /dq everywhere would slowdown the performance, and we have a doc on the Q/DQ placements impact on the perf.

PTQ

NVIDIA/TensorRT#3205 (comment)

带自定义插件的后量化

  • onnx 上构建带plugin的层,标记plugin的输出tensor

  • 支持int8 的插件开发
    int8_q-dq

  • plugin 需要支持fp32,然后在calib table中查找输出tensor 的scale
    ptq

显示量化设置

插入 Q/DQ 在插件层的前后
image

fuse

  • PTQ calib 期间可以进行fuse_bn,减少bn layer的标定,降低标定时间和calib 误差

sensitivity layer profile

  • 找到所有 quant layer
  • 每次仅使能一层quant layer进行指标eval,记录到dict中 {"layer_name":eval_value}
  • 如果是使用 pytorch-quantization calib,则是在pytorch 下寻找敏感层
  • 如果是使用 TensorRT calib,则是在 fp32 下,使用 TensorRT 或 onnxruntime 下寻找