You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
According to the original project, it is stated that about a dozen reference videos are needed for high accuracy sign recognition, and around 5 are needed for 'decent' accuracy.
We should make a point to test these numbers and ensure they are accurate. Ideally we should confirm or find the minimum number of videos for 'basic' accuracy, as well as the amount needed for 'high accuracy'.
Another thing to note is the performance impact of a large dataset. Each set of frames to consider is checked against all signs, so adding more videos may result in slower performance. We should find the maximum number of videos that leads to unacceptable performance.
Finally, both issues should be taken into account to find the ideal number of videos to reach a level of accuracy and performance that we find acceptable. This may include making decisions about which we care about more, speed or accuracy.
This is not an easy problem to tackle! It will likely be revisited and worked upon many times. We can put findings and discussion about speed and accuracy here.
The text was updated successfully, but these errors were encountered:
I'm attempting to benchmark the performance of this thing, and I've run into some issues. Noting them here:
Some videos' landmarks cannot be extracted. I'm not quite sure what's going on, but I'm skipping them for now. My feeling is that a couple videos here and there aren't a big deal, but examples include home-home-sports ASL and stop-How To Sign The Word Stop In ASL
Of greater concern is the the model's inability to differentiate between signs like "please" and "me". Using benchmark_signs in f90a845, most things displayed as "unknown sign". But reading the console output of distances, it still correctly ranked signs most of the time. More investigation needed.
According to the original project, it is stated that about a dozen reference videos are needed for high accuracy sign recognition, and around 5 are needed for 'decent' accuracy.
We should make a point to test these numbers and ensure they are accurate. Ideally we should confirm or find the minimum number of videos for 'basic' accuracy, as well as the amount needed for 'high accuracy'.
Another thing to note is the performance impact of a large dataset. Each set of frames to consider is checked against all signs, so adding more videos may result in slower performance. We should find the maximum number of videos that leads to unacceptable performance.
Finally, both issues should be taken into account to find the ideal number of videos to reach a level of accuracy and performance that we find acceptable. This may include making decisions about which we care about more, speed or accuracy.
This is not an easy problem to tackle! It will likely be revisited and worked upon many times. We can put findings and discussion about speed and accuracy here.
The text was updated successfully, but these errors were encountered: