Improve Python API for model serialization #25

TomScheffers · 2022-08-21T12:30:13Z

I am loading a model with thousands of trees, which takes approx. 10minutes. Therefore I want to compile the model once and then serialize it to file. Pickle or dill give the following error: "ValueError: ctypes objects containing pointers cannot be pickled". Is there a way to save/load the file to/from disk? Thanks :)

siboehm · 2022-08-21T12:40:45Z

There's a cache=<filepath> parameter for lleaves.Model.compile(). Does that do what you're looking for? See docs for more info. I looked into supporting pickling a while ago, but the cache parameter seemed like the cleaner solution.

TomScheffers · 2022-08-21T12:54:58Z

Thanks for your quick response. That should do the job! A nice addition would be to have a @classmethod (lleaves.Model.from_cache) that initializes a model directly from cache, as now you still have to initialize with the model_txt.

Love your work on this package. FYI: I get a ~10x speedup compared to the lightgbm.predict method, using a lot of categorical variables.

siboehm · 2022-08-21T15:56:16Z

Yeah, you're right, classmethod would be nicer. Currently, what's stored in the cache is an ELF file (on Linux), containing the compiled function. Recreating a lleaves.Model from the ELF file alone would require storing information about eg the pandas_categoricals (which is a list of lists of strings) as a static variable into the ELF file, which sounds like a PITA.

I might look into this again at some point maybe. I assume there'll either be some way to enable pickling, or I'll serialize the pandas categorical list somehow or I'll have a "light"-version of pickling, where the model can be pickled but it will not include the compiled function, requiring you to store 2 files (the pickled model, and the ELF cache file).

Love your work on this package. FYI: I get a ~10x speedup compared to the lightgbm.predict method, using a lot of categorical variables.

I'm glad to hear lleaves is working for you! :)

siboehm changed the title ~~Model serialization~~ Improve Python API for model serialization Aug 21, 2022

siboehm added the enhancement New feature or request label Aug 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Python API for model serialization #25

Improve Python API for model serialization #25

TomScheffers commented Aug 21, 2022

siboehm commented Aug 21, 2022

TomScheffers commented Aug 21, 2022

siboehm commented Aug 21, 2022

Improve Python API for model serialization #25

Improve Python API for model serialization #25

Comments

TomScheffers commented Aug 21, 2022

siboehm commented Aug 21, 2022

TomScheffers commented Aug 21, 2022

siboehm commented Aug 21, 2022