-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Nemotron-CC quality classifiers #518
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did an initial review, mostly looks good.
Have requested changes around naming, class structure etc.
model = AutoModelForSequenceClassification.from_pretrained( | ||
self.path_or_name, torch_dtype=torch.bfloat16 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting that we read FINEWEB_MIXTRAL_IDENTIFIER
, FINEWEB_NEMOTRON_IDENTIFIER
in bfloat16 now.
Do you know why we cant/dont do it for EDU classifier ?
Can you add a comment stating the reason for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, this was how it was in the script that the Nemotron-CC developers used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do a quick benchmark with autocast. I think this should happen automatically when we use torch.autocast
, which we use/or should use.
If the results on a dataset line up (for both accuracy and throughput) we can probably skip this fork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just ran, the results look a bit different without the bfloat16
but still pretty similar to the ones from before. I have removed torch_dtype=torch.bfloat16
for now.
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nits around autocast and tokenizer types
model = AutoModelForSequenceClassification.from_pretrained( | ||
self.path_or_name, torch_dtype=torch.bfloat16 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do a quick benchmark with autocast. I think this should happen automatically when we use torch.autocast
, which we use/or should use.
If the results on a dataset line up (for both accuracy and throughput) we can probably skip this fork.
Signed-off-by: Sarah Yurick <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Awaiting Hugging Face releases.
TODO: