-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hdf5 files are opened too often in DataContainer when loading SPHInX jobs #1574
Comments
Yep, this would the solution I'd like too. Last I tried to prototype it I got bogged down in the details, but in the end I think it would have to be something like this. |
Same for me… |
So I did a quick survey and I think it can be implemented in
The first is probably the smallest problem, but I'd have to check where we could hook into to circumvent the second problem. |
To avoid concurrent efforts: I will make a first draft of this; we can discuss it on Monday. |
Sorry, I couldn't sleep yesterday and I have something almost there now. :') |
Main changes to |
Ok I’m gonna try to work on it by Monday just in order to make it more complicated |
Test example:
I get 269
_open_hdf
calls for the run, and 802 calls for the load.From discussions with @jan-janssen and @pmrv , I get the info that the key challenge here is
Possible solution
hdf_leave_open
)Use scenario:
In this way, high-level code can indicate when it is going to enter and leave hdf5-intense code, and low-level code would be augmented by a single check for the existence of an open file handle before opening the file. Performance-wise, the open call is much more expensive than the dictionary lookup, so no measurable price to pay when the cache is not used.
Using a context should make this rather robust against unexpected errors. The new context manager should check at context enter if the cache is already filled, and if so, do nothing and set a "noop" flag for the context exit. Then, one could even deal with nested contexts for the same file if programmers do not realize that other parts of the code also have caching instructions. If pyiron needs to be thread-safe, a locking mechanism for the cache access is needed.
The text was updated successfully, but these errors were encountered: