tensorrt-insight/calibration at main · lix19937/tensorrt-insight

History

Name		Name	Last commit message	Last commit date
parent directory ..
demo		demo
calib_pth.md		calib_pth.md
dynamic_shape_ptq.md		dynamic_shape_ptq.md
int4.md		int4.md
readme.md		readme.md
trt_int8.md		trt_int8.md

readme.md

We call the setDynamicRange or calibration table with name implicit int8 precision,
and the Q/DQ generated by pytorch-quantization explicit int8 precision.
The difference is that for implicit, TRT try to optimize the best performance, for explicit, TRT need guarantee the same accuracy as in the original framework while optimize the performance. So for explicit, we have rules how to propagate the Q/DQ nodes and fuse them with other nodes. put /dq everywhere would slowdown the performance, and we have a doc on the Q/DQ placements impact on the perf.

PTQ

使用 TensorRT 闭源方法进行 PTQ
https://github.com/lix19937/trt-samples-for-hackathon-cn/tree/master/cookbook/03-BuildEngineByTensorRTAPI/MNISTExample-pyTorch/C%2B%2B

https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_int8_calibrator.html

类型	说明
IInt8EntropyCalibrator	Entropy calibrator. This is the Legacy Entropy calibrator. It is less complicated than the legacy calibrator and produces better results.
IInt8EntropyCalibrator2	Entropy calibrator 2. This is the preferred calibrator. This is the required calibrator for DLA, as it supports per activation tensor scaling.
IInt8LegacyCalibrator	Legacy calibrator left for backward compatibility with TensorRT 2.0. This calibrator requires user parameterization, and is provided as a fallback option if the other calibrators yield poor results.
IInt8MinMaxCalibrator	It supports per activation tensor scaling.

NVIDIA/TensorRT#3205 (comment)

使用 pytorch-quantization 进行Q-DQ设置，然后进行开源方法 PTQ
https://github.com/lix19937/pytorch-quantization/tree/main/pytorch_quantization/calib
- max
- hist
  - 交叉熵
  - mse
  - 统计分位数
详细见 https://github.com/lix19937/pytorch-quantization/blob/main/readme_lix.md

带自定义插件的后量化

onnx 上构建带plugin的层，标记plugin的输出tensor
支持int8 的插件开发
plugin 需要支持fp32，然后在calib table中查找输出tensor 的scale

显示量化设置

插入 Q/DQ 在插件层的前后

fuse

PTQ calib 期间可以进行fuse_bn，减少bn layer的标定，降低标定时间和calib 误差

sensitivity layer profile

找到所有 quant layer
每次仅使能一层quant layer进行指标eval，记录到dict中 {"layer_name":eval_value}
如果是使用 pytorch-quantization calib，则是在pytorch 下寻找敏感层
如果是使用 TensorRT calib，则是在 fp32 下，使用 TensorRT 或 onnxruntime 下寻找

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

calibration

calibration

readme.md

PTQ

带自定义插件的后量化

显示量化设置

fuse

sensitivity layer profile

Files

calibration

Directory actions

More options

Directory actions

More options

Latest commit

History

calibration

Folders and files

parent directory

readme.md

PTQ

带自定义插件的后量化

显示量化设置

fuse

sensitivity layer profile