Lack of half precision as datatype for PTQ. #1173

hkayann · 2025-02-04T22:07:30Z

I might be missing something very simple, so I wanted to ask for clarification regarding the data type support in ptq.evaluate.py.

Currently, the script includes the following argument for data type selection:

parser.add_argument(
    '--dtype', default='float', choices=['float', 'bfloat16'], help='Data type to use')

Have we considered extending this to support FP16 (i.e. torch.half) by modifying the argument as follows?

parser.add_argument(
    '--dtype',
    default='float',
    choices=['float', 'bfloat16', 'half'],
    help='Data type to use (float for FP32, bfloat16, or half for FP16)')

My question is: Is there a specific reason we don’t include FP16 support by default but include BF16?

The text was updated successfully, but these errors were encountered:

Giuseppe5 · 2025-02-05T13:34:03Z

I will look into that. It shouldn't be a problem.

Thanks for this, I'll update once the PR is opened/merged

hkayann · 2025-02-05T13:38:44Z

Many thanks for the quick feedback @Giuseppe5.

I can put PR asap. I have tested and seen no problems at all.

Giuseppe5 · 2025-02-05T13:42:44Z

Feel free to open a PR, but I might ask please to also update the README?

We have the list with all the options, and we would like to that up-to-date. We generally just use the output of the argparse helper.

Thanks!

Giuseppe5 · 2025-02-05T13:43:24Z

Also, please check the CONTRIBUTING.md file about how to sign off your commit

hkayann · 2025-02-05T15:08:52Z

Hi @Giuseppe5, I've put the PR dev branch, and seems like ReadME was not updated via argparser for a while as I see some other changes there as well.

Fantastic work btw, many thanks for this API. I needed the custom bit-width implementations for FP16, and this API provides that.

Giuseppe5 · 2025-02-05T15:52:06Z

Sometimes we struggle to keep up with our own rules, but we're working hard to improve :)
Thanks for work and the kind words too!

Giuseppe5 · 2025-02-06T13:54:15Z

Thanks for the PR.

With respect to the comment about float16 and custom precision minifloat, it could be worth opening a separate issue with some examples and we can discuss a bit more in details what's going on there.

hkayann · 2025-02-06T19:22:27Z

You are welcome, @Giuseppe5.

For now, I am going to do a bit more digging, but I already have a workable setup where I save models in FP32 and still simulate custom bit widths via the embedded quantization metadata. Many thanks for the offer.

Giuseppe5 · 2025-02-07T13:21:09Z

#1174 was merged, closing this :)

hkayann added the enhancement New feature or request label Feb 4, 2025

Giuseppe5 closed this as completed Feb 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lack of half precision as datatype for PTQ. #1173

Lack of half precision as datatype for PTQ. #1173

hkayann commented Feb 4, 2025

Giuseppe5 commented Feb 5, 2025

hkayann commented Feb 5, 2025

Giuseppe5 commented Feb 5, 2025

Giuseppe5 commented Feb 5, 2025

hkayann commented Feb 5, 2025

Giuseppe5 commented Feb 5, 2025

Giuseppe5 commented Feb 6, 2025

hkayann commented Feb 6, 2025

Giuseppe5 commented Feb 7, 2025

Lack of half precision as datatype for PTQ. #1173

Lack of half precision as datatype for PTQ. #1173

Comments

hkayann commented Feb 4, 2025

Giuseppe5 commented Feb 5, 2025

hkayann commented Feb 5, 2025

Giuseppe5 commented Feb 5, 2025

Giuseppe5 commented Feb 5, 2025

hkayann commented Feb 5, 2025

Giuseppe5 commented Feb 5, 2025

Giuseppe5 commented Feb 6, 2025

hkayann commented Feb 6, 2025

Giuseppe5 commented Feb 7, 2025