-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics wishlist #6
Comments
Are you looking to create native You would basically just need a thin wrapper around the Thoughts? |
I think the main issue is that yardstick doesn't support streaming metrics, so it must see predictions for the full dataset + targets in order to compute the value. We could store the predictions/targets, but we can rapidly go out of memory if doing something like U-Net or even a classification problem with thousands of classes. We could also compute the metric per batch and then average over all batches, but this makes it very hard to compare metrics between runs and was a significant source of confusion in Keras when reported metric value was not identical to computing it yourself. We could definitely have a What do you think? |
Sorry, just so I understand, does streaming mode mean that batch-wise calculations are Whereas for |
Yes, exactly! The problem is that Edit, sorry, |
I have been working on some metrics for semantic segmentation tasks using luz_metrics: f1-score, precision, and recall. The goal is for each metric to allow for assessment of both multiclass and binary classification problems. For multiclass problems, both micro and macro averaging are implemented. However, note that micro averaged recall, precision, and f1-score are all equivalent and actually equal to overall accuracy. So, maybe it only makes sense to implement macro averaging. It would also be good to generalize these to work with not just 2D semantic segmentation task (e.g., scene labeling and 3D semantic segmentation tasks). I have posted the code for each metric below. I am not sure if they are working correctly yet and would appreciate any feedback. If it is determined that they are working properly or if they can be corrected/improved by input from others, it would be great to add them to luz if there is interest. I have added comments for the precision metric throughout. I have only added comments within other two metrics where they differ. I am working on a package for geospatial deep learning where I have integrated these metrics: https://github.com/maxwell-geospatial/geodl. This is a work in progress. Precision draft:
Recall draft:
F1-Score draft:
|
The text was updated successfully, but these errors were encountered: