Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to serialize lambdas or classes on Databricks #343

Open
wsuchy opened this issue Oct 10, 2019 · 5 comments
Open

Unable to serialize lambdas or classes on Databricks #343

wsuchy opened this issue Oct 10, 2019 · 5 comments
Labels

Comments

@wsuchy
Copy link

wsuchy commented Oct 10, 2019

the following code:

import dill
fnc = lambda x:x
dill.dumps(fnc, recurse=False)

fails on Databricks notebook with the following error:

Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.

Obviously Pickler is trying to serialize spark context but I don't refer to spark context anywhere here. I tried also removing spark from globals, but it didn't work. Do you have any ideas what else can I do to prevent it from serializing spark context?

@GeethanadhP
Copy link

I am having the same issue

@Axxeption
Copy link

I have this issue as well, any updates on this?

@ekdz
Copy link

ekdz commented Aug 31, 2022

I'm dealing with the same issue. Any updates?

@mmckerns
Copy link
Member

Sorry for the slow reply.

A function (including a lambda) has a reference to the global namespace. So, function serialization requires some pickling of the global namespace. The setting recurse=False will try to serialize the entire global dict, while recurse=True tries to use reference tracing to only serialize what the function needs out of the global dict. With the former, you'd need to remove all references to unserializable objects defined in the global namespace. dill does have some tools (in dill.detect) to identify which objects within a namespace are unserializable, and which objects cause serialization to fail. There's also some tools (in dill.session) that can serialize namespaces, where unserializable objects, on the fly, can potentially be serialized by reference. Additionally, there's #475, which potentially filters unserializable objects from the global namespace.

The easiest thing to do, however, is to use recurse=True.

I'm going to close this as answered, but if you feel it's not... then please do reopen and continue the conversation.

@mmckerns mmckerns modified the milestone: dill-0.3.6 Aug 31, 2022
@mmckerns mmckerns reopened this Aug 31, 2022
@mmckerns
Copy link
Member

Actually, I'm going to leave this open a bit more. Is the issue just with lambdas and classes, or is it also with functions, or any object that refers to the global dict...?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants