mllama-3.2 INT8 Quantized Model on GPU #2639

ekurniaw · 2025-01-09T07:08:58Z

Why is the INT8 quantized model disabled on GPU? I tried the INT8 optimization on CPU and then performed inference using the INT8 model on GPU. Tested working on Arc A310 dGPU and ARL-H iGPU. Can we create such flow for the notebook? Thanks.

openvino_notebooks/notebooks/mllama-3.2/ov_mllama_compression.py

Line 56 in f9caad1

    
           description="Vision Encoder", options=optimizations, value=optimizations[0] if "GPU" in device else optimizations[1], disabled="GPU" in device

eaidova · 2025-01-10T04:55:37Z

Hi @ekurniaw, which openvino version do you use? We limited int8 support for GPU in notebook because at the moment of notebook publication, int8 image encoder usage for GPU leads to inaccurate generation results (the model works itself but the response does not have any related to image info). I got confirmation that recently, it was fixed, so I think yes, we can update notebook to allow int8 image encoder running for GPU with updating openvino in notebook requirements

ekaakurniawan · 2025-01-17T20:00:32Z

@eaidova , I am using OpenVINO 2024.5. For Arc A770, I am using the following setup.

ARL-S CPU
Ubuntu 24.04.1 LTS
Kernel 6.8.0-51-generic
Intel Compute Runtime 24.39.31294
Python 3.12.3

The new code tested successfully except for the interactive demo. Raised the issue #2669.

eaidova linked a pull request Jan 10, 2025 that will close this issue

allow int8 image encoder for gpu #2644

Merged

eaidova closed this as completed in #2644 Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mllama-3.2 INT8 Quantized Model on GPU #2639

mllama-3.2 INT8 Quantized Model on GPU #2639

ekurniaw commented Jan 9, 2025

eaidova commented Jan 10, 2025 •

edited

Loading

ekaakurniawan commented Jan 17, 2025

mllama-3.2 INT8 Quantized Model on GPU #2639

mllama-3.2 INT8 Quantized Model on GPU #2639

Comments

ekurniaw commented Jan 9, 2025

eaidova commented Jan 10, 2025 • edited Loading

ekaakurniawan commented Jan 17, 2025

eaidova commented Jan 10, 2025 •

edited

Loading