-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cache engine for knitr #167
base: main
Are you sure you want to change the base?
Conversation
I'll let @yihui comment on the knitr PR and @kevinushey comment on how this fits with our knitr engine. One thing I'd like to see is that the dll module isn't actually required but instead used if it's available. It also seems like there should be an option toggling pickle vs. dll. |
Thanks for the feedback. If you look at line 185 of As for pickle vs. dill, that is a more complicated problem. dill is able to save more types of objects than pickle in an easier way (one call to It would be much more difficult to setup the same system using pickle, and I worry I would be duplicating the existing dill functionality. |
Okay, that's a fair point about duplicating work. I think that the warning as you have it is fine. |
R/knitr-engine.R
Outdated
}}) | ||
if (dill == "No dill") return() | ||
|
||
py_run_string("globals().pop('r')") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably needs to ensure that the r
/ R
object is truly the 'shim' R object introduced by reticulate
, since a user could in theory overwrite it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback. I'm not actually sure how to check if the class R
is the one introduced by reticulate
or the user. Take a look at my latest commit and see if you think it is sufficient. If not, I can try to find a work around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks sufficient to me, but you're entirely right that we should have a way of making this more deterministic. Maybe that class should instead be defined within our rpytools
module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. Then we could possibly do something like this: https://stackoverflow.com/questions/14570802/python-check-if-object-is-instance-of-any-class-from-a-certain-module
I'll see if I can implement something, although I haven't messed with that side of the package before.
R/knitr-engine.R
Outdated
stop(module$message) | ||
} | ||
} else { | ||
py_run_string("import dill") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this line necessary? IIUC the above import attempt should ensure the module is loaded if needed. Or is this done to put dill
into the main module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, it isn't necessary. This was a hold over from a previous caching attempt. I just added a commit removes it. The new commit passes all tests on my end.
This PR looks good to me, but I think we might want to hold this until the next release of The It looks like there are some merge conflicts on this PR (sorry, probably from my recent merge). Would you be able to fix those up? |
@tmastny I don't quite see the usefulness of the object https://github.com/yihui/knitr/blob/0da648bff63/R/engine.R#L194-L199 |
@yihui I will carefully review Rcpp's cache engine and see what I can learn. However, the reason I choose to add a
I may be wrong, but I believe what you are suggesting (allowing the engines to handle caching) would require some additional refactoring on the |
@tmastny Thanks for the diagrams! For the |
Not to state the obious, and I only glanced at this, but I also think that knitr benefits from |
@yihui Ah, thanks for pointing that out. So if I understand correctly, when the engine is Rcpp the standard R caching no longer executes. I don't believe this would work for my implementation. My solution only caches the python environment and depends on R to cache everything else. I'd have to go back to the drawing board to make the python engine cache everything. Edit: This could work if R has nothing important to cache when the chunk is ran through the python engine, but seeing as how reticulate can exchange data with R, I think R should be allowed to cache. |
@tmastny Okay, I'll spend more time on understanding your PR when I have a chance. Thanks! |
FYI I have merged the knitr PR yihui/knitr#1505. |
@kevinushey is this something you will consider ? The knitr PR is merged but the following part in reticulate wasn't. I am wondering if there is a reason or just because time flies. 😄 |
The PR has some conflicts and it's a bit stale -- @tmastny, sorry for letting this drop to the wayside but would you have time to bring this PR up-to-date? |
Yes I could revisit this in a couple of weeks. |
Working on this issue: uqfoundation/dill#463, yihui/knitr#1505 |
This is my attempt to add Python session caching between knitr chunks in
reticulate
. The pull request addressesknitr
issue #1505.First, note that the cache engine depends on a new cache engine API seen in knitr pull request #1518. The basic idea is that we need to add this line to the setup chunk:
My implementation depends on the Python module dill. This is a well-regarded extension to basic Python pickling. As you can see in the Github issues there are known limitations, but
dill
is actually able to save more types of objects than the standard pickle. This includes session variables (with names) and imported modules. I have a dill unit test showcasing some basic functionality.The main purpose of the pull request as described in knitr issue #1505 is tested in the cache engine unit test.