-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Support pickling lru_cache on CPython #309
base: master
Are you sure you want to change the base?
Conversation
We don't have a good way to special-case the handling of lru_cache on pypy.
I just remembered that I looked at this while back, but never completed it. But might be a good point of reference nonetheless: master...csadorf:fix/issue-178 |
Thanks for the patch. You also need to implement this behavior for |
@pierreglaser I'm happy to port this to cloudpickle_fast if we decide this is a reasonable path forward, but I think the first order question to answer is whether the behavior added here (in particular, losing the value of |
Ah, right. Is there any good reason for CPython not to expose the |
Not that I can think of besides a possible general aversion to exposing implementation details. But given that
That seems to be the case yes, although I'm a bit confused how that works b/c the pure python implementation in the stdlib uses |
Then my opinion is that it is reasonable to open an issue in upstream to start a discussion. It could be exposed as a read-only attribute, it should not hurt much.
No, we do not support |
I think that we cannot move forward without also supporting the |
I think this is my inclination as well. If so, then this PR can probably be closed until upstream provides a way to expose |
I posted an issue to bugs.python.org here: https://bugs.python.org/issue38565. Will post a PR to expose the attribute in a bit. Any advice on who to prod for getting those reviewed/triaged would be helpful. |
The typed value is now exposed in upstream Python 3.9 (dev): python/cpython@051ff52 Feel free to update this PR accordingly. |
Thinking about this a bit more, @lru_cache(maxsize=1000)
def expensive_computation(a):
return a ** 2
for result in executor.map(expensive_computation, [0, 1, 42] * 10000):
print(result) the This behavior might be surprising to the users but I am not sure exactly how we could fix this. We could annotate the LRUCache instance with a UUID at pickling time a bit like what we do for class definition and enums (see |
There's no way to plug into |
I was poking through the issue tracker over the weekend and noticed #178. This is an initial pass on what I think is feasible to support without aggressive hackery:
TL;DR:
lru_cache
on CPython.lru_cache
d function's associated cache.cached_func.maxsize
, but loses the value oftyped
that was passed to the cached function. I don't thinktyped
is used very often, but it seems unfortunate that this PR silently discards that info. I don't have a good idea for how to fix that without upstream changes in CPython.Following @ogrisel's comment in #178 (comment), we can support simple usage of
lru_cache
in CPython straightforwardly, because lru-cached functions are instances of an extension type that provides enough of an API (via.__wrapped__
and.cache_info()
) to allow us to extract the cache'smaxsize
.Unfortunately, we do not have a way to recover the
typed
parameter that was added in python 3.3, which causes the decorated function to use separate caches for different input types. This might be enough of a concern that we don't want to merge this PR (or, at least, wouldn't want to enable it without some kind of explicit opt-in from the user). As-is, this PR just silently discards thetyped
param, which is not great.On PyPy, things are a bit trickier. There,
lru_cache
uses the pure-python implementation from the stdlib, which just returns a closure that closes over the cache state. Since we can already serialize closures, this "works", but it introduces a new issue, which is that PyPy also serializes the cache, which might not be desirable (at least, it seems unfortunate that the behavior would be different between CPython and PyPY). On the other hand, it's not clear how we could prevent this: pickle's dispatching is all type based, and to any external viewer an lru-cached function is just a regular python function on pypy, so there isn't really a way for us to treat lru-cached functions specially unless we want to add a special case to check on all function serializations (e.g. we could branch onif hasattr(func, 'cache_clear')
or similar). Whether that cost is worth the effort to ensure consistency between CPython and PyPy is an interesting question. I'm seeing several unrelated test failures onpypy
on master anyway, so maybe we don't care?