-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluation tool #66
Evaluation tool #66
Conversation
@ibrahimsharaf I will try not to change anything related to you work in existing code to simplify future merge. But you can have a look at my classes hierarchy to make you algorithm compatible with mine |
Codecov Report
@@ Coverage Diff @@
## master #66 +/- ##
==========================================
+ Coverage 39.72% 41.47% +1.74%
==========================================
Files 10 14 +4
Lines 1772 4239 +2467
==========================================
+ Hits 704 1758 +1054
- Misses 1068 2481 +1413
Continue to review full report at Codecov.
|
refactored/cache.py
Outdated
def build(stream): | ||
"""build from downloaded archives""" | ||
|
||
# TODO: is it possible to have different `signature`s for one `proto_signature` ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there should be only one signature for each proto_signature (that is basically a stack trace).
Overall, it looks good so far. Is the cache actually needed? How slow is it to rebuild it? If it isn't slow, we could remove the cache altogether (it's better to avoid premature optimizations). |
@marco-c Yes, the cache is actually needed. It dramatically speedup experiments evaluating and network operations. I can profile it and publish speedup results for my desktop machine, if you wish.
It cost no time to rebuilt the cache, while it is built during reading. The only problem is disk space. Model+corpus take ~600mb of disk space |
Evaluation tool seems to be done. We have fully working prototype which produce compact human readable(and understandable) results! 😎 UPD: Oh, there is still some work to be done. |
I really like the changes, but could you split them in separate PRs? One PR to refactor the code with the classes; one PR to refactor the downloader; one PR to introduce the cache; one PR to add the basic evaluation tool. Also, put everything in the top source directory and not in a
Yes, this can be an additional and separate PR. |
The only problem is that for the evaluation we will use periods of time far from each other, so the cache might not help much in those cases. |
Yes, for sure. Cache won't help much in production, and will be turned off in the future. But while we are in stage of active development it will help us :) |
Last part of this PR is implemented in #88 |
While working on #39 I faced many issues and decided to start big refactoring.
Add cache(100MB in ram for all archived signatures) to speedup signature access operations.
Add downloads cache to reduce network operations time.
Introduce abstract
Algorithm
class and start working on slow reference implementation of WMDistanceModel, which will be base line for all future modelsComments are welcome :)