Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log a warning/raise error if user tries to cache something undeterministic (uncacheable) #12229

Open
3 tasks done
N-Demir opened this issue Mar 9, 2024 · 1 comment
Open
3 tasks done
Labels
enhancement An improvement of an existing feature upstream dependency An upstream issue caused by a bug in one of our dependencies

Comments

@N-Demir
Copy link
Contributor

N-Demir commented Mar 9, 2024

First check

  • I added a descriptive title to this issue.
  • I used the GitHub search to find a similar request and didn't find it.
  • I searched the Prefect documentation for this feature.

Prefect Version

2.x

Describe the current behavior

I initially thought this was a prefect bug, but after a lot of investigation I was able to determine that cloudpickle used by task_input_hash is non deterministic from run to run in lots of different scenarios. See the following issues opened on their end:

For me, the issue came from trying to pass in a class to a prefect task with caching enabled using task_input_hash. Strangely this is only deterministic if the class is first imported from another module and not defined in the same file as the script that kicks off the flow (not defined in __main__).

This single hard to debug issue has led to an incredible amount of frustration at prefect and confusion among myself and team members, and from surveying the landscape of caching in python there doesn't seem to be an incredible out of the box easier solution

Describe the proposed behavior

I'm not expert enough to know of what the best solution is or if any of the other libraries I mentioned are definitively better, but from a user's perspective I want to emphasize one simpler thing that streamlit does well that has saved me a ton of pain in the past: the UnhashableParamError

In streamlit's caching if you try to cache a custom object it will just error and you will receive a UnhashableParamError. Had prefect done this for me it would've alleviated so so much confusion about why caching wasn't working (leading me down rabbit holes of getting more confused with the distinction between prefect caching and results).

Perhaps an error is aggressive, but even a warning log would make a huge difference to debugging why from run to run prefect's caching isn't working. And maybe task_input_hash should be more restrictive in forcing user's to have hashable args instead of blindly using a nondeterministic cloudpickle as the failsafe to json

Example Use

No response

Additional context

No response

@N-Demir N-Demir added enhancement An improvement of an existing feature needs:triage labels Mar 9, 2024
@jakekaplan
Copy link
Contributor

jakekaplan commented Mar 11, 2024

Hi @N-Demir, thanks for submitting this. I'm sorry about the friction here! I also appreciate the effort you put into writing this issue.

Do you have an MRE you can share with what you were providing to task_input_hash that can show/reproduce the behavior you were seeing? Thanks!

@cicdw cicdw added the upstream dependency An upstream issue caused by a bug in one of our dependencies label Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An improvement of an existing feature upstream dependency An upstream issue caused by a bug in one of our dependencies
Projects
None yet
Development

No branches or pull requests

4 participants