Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of half precision as datatype for PTQ. #1173

Closed
hkayann opened this issue Feb 4, 2025 · 9 comments
Closed

Lack of half precision as datatype for PTQ. #1173

hkayann opened this issue Feb 4, 2025 · 9 comments
Labels
enhancement New feature or request

Comments

@hkayann
Copy link

hkayann commented Feb 4, 2025

I might be missing something very simple, so I wanted to ask for clarification regarding the data type support in ptq.evaluate.py.

Currently, the script includes the following argument for data type selection:

parser.add_argument(
    '--dtype', default='float', choices=['float', 'bfloat16'], help='Data type to use')

Have we considered extending this to support FP16 (i.e. torch.half) by modifying the argument as follows?

parser.add_argument(
    '--dtype',
    default='float',
    choices=['float', 'bfloat16', 'half'],
    help='Data type to use (float for FP32, bfloat16, or half for FP16)')

My question is: Is there a specific reason we don’t include FP16 support by default but include BF16?

@hkayann hkayann added the enhancement New feature or request label Feb 4, 2025
@Giuseppe5
Copy link
Collaborator

I will look into that. It shouldn't be a problem.

Thanks for this, I'll update once the PR is opened/merged

@hkayann
Copy link
Author

hkayann commented Feb 5, 2025

Many thanks for the quick feedback @Giuseppe5.

I can put PR asap. I have tested and seen no problems at all.

@Giuseppe5
Copy link
Collaborator

Feel free to open a PR, but I might ask please to also update the README?

We have the list with all the options, and we would like to that up-to-date. We generally just use the output of the argparse helper.

Thanks!

@Giuseppe5
Copy link
Collaborator

Also, please check the CONTRIBUTING.md file about how to sign off your commit

@hkayann
Copy link
Author

hkayann commented Feb 5, 2025

Hi @Giuseppe5, I've put the PR dev branch, and seems like ReadME was not updated via argparser for a while as I see some other changes there as well.

Fantastic work btw, many thanks for this API. I needed the custom bit-width implementations for FP16, and this API provides that.

@Giuseppe5
Copy link
Collaborator

Sometimes we struggle to keep up with our own rules, but we're working hard to improve :)
Thanks for work and the kind words too!

@Giuseppe5
Copy link
Collaborator

Thanks for the PR.

With respect to the comment about float16 and custom precision minifloat, it could be worth opening a separate issue with some examples and we can discuss a bit more in details what's going on there.

@hkayann
Copy link
Author

hkayann commented Feb 6, 2025

You are welcome, @Giuseppe5.

For now, I am going to do a bit more digging, but I already have a workable setup where I save models in FP32 and still simulate custom bit widths via the embedded quantization metadata. Many thanks for the offer.

@Giuseppe5
Copy link
Collaborator

#1174 was merged, closing this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants