Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Python API for model serialization #25

Open
TomScheffers opened this issue Aug 21, 2022 · 3 comments
Open

Improve Python API for model serialization #25

TomScheffers opened this issue Aug 21, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@TomScheffers
Copy link

I am loading a model with thousands of trees, which takes approx. 10minutes. Therefore I want to compile the model once and then serialize it to file. Pickle or dill give the following error: "ValueError: ctypes objects containing pointers cannot be pickled". Is there a way to save/load the file to/from disk? Thanks :)

@siboehm
Copy link
Owner

siboehm commented Aug 21, 2022

There's a cache=<filepath> parameter for lleaves.Model.compile(). Does that do what you're looking for? See docs for more info. I looked into supporting pickling a while ago, but the cache parameter seemed like the cleaner solution.

@TomScheffers
Copy link
Author

Thanks for your quick response. That should do the job! A nice addition would be to have a @classmethod (lleaves.Model.from_cache) that initializes a model directly from cache, as now you still have to initialize with the model_txt.

Love your work on this package. FYI: I get a ~10x speedup compared to the lightgbm.predict method, using a lot of categorical variables.

@siboehm
Copy link
Owner

siboehm commented Aug 21, 2022

Yeah, you're right, classmethod would be nicer. Currently, what's stored in the cache is an ELF file (on Linux), containing the compiled function. Recreating a lleaves.Model from the ELF file alone would require storing information about eg the pandas_categoricals (which is a list of lists of strings) as a static variable into the ELF file, which sounds like a PITA.

I might look into this again at some point maybe. I assume there'll either be some way to enable pickling, or I'll serialize the pandas categorical list somehow or I'll have a "light"-version of pickling, where the model can be pickled but it will not include the compiled function, requiring you to store 2 files (the pickled model, and the ELF cache file).

Love your work on this package. FYI: I get a ~10x speedup compared to the lightgbm.predict method, using a lot of categorical variables.

I'm glad to hear lleaves is working for you! :)

@siboehm siboehm changed the title Model serialization Improve Python API for model serialization Aug 21, 2022
@siboehm siboehm added the enhancement New feature or request label Aug 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants