-
Notifications
You must be signed in to change notification settings - Fork 249
R and some R packages for similar functionality like pydatalab #1881
base: master
Are you sure you want to change the base?
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed, please reply here (e.g.
|
I signed it! |
CLAs look good, thanks! |
Pulled this locally to get a better diff. The new stuff is: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New GPL licensed components need to have their sources somewhere in the docker container,
libssh2-1-dev \ | ||
libcurl4-openssl-dev \ | ||
libssl-dev \ | ||
littler \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
littler is GPL licensed. It needs to be included in the list on line 46 so that we distribute the source code.
Same with r-base, r-base-dev, and r-recommended, I think.
@@ -1,252 +1,293 @@ | |||
# Copyright 2015 Google Inc. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear on the change, but looks like all lines were modified? Can you fix so the delta is clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments, mainly about moving the changes into the same RUN
layer for better cleanup.
Do we know how much this adds to the docker image size?
ENV PATH $PATH:/tools/node/bin:/tools/google-cloud-sdk/bin | ||
ENV PYTHONPATH /env/python | ||
|
||
RUN apt-get update -y && apt-get install -y --no-install-recommends apt-utils |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can move apt-utils
down into the next RUN
layer to avoid a large docker image size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, why is this package needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that apt-utils
is not actually needed by inspecting for example Rocker's various dockerfile compositions (here).
|
||
## Now install R and littler, and create a link for littler in /usr/local/bin | ||
## Also set a default CRAN repo, and make sure littler knows about it too | ||
RUN apt-get update \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This step needs to be incorporated into the main install step before the cleanup, I'd move it around line 42, and then the R
install commands can move next to the pip install
commands for better grouping, all in the same big RUN
layer. We do this so that the cleanup can happen in the same docker layer to keep the image size as small as possible.
&& ln -s /usr/share/doc/littler/examples/testInstalled.r /usr/local/bin/testInstalled.r \ | ||
&& install.r docopt \ | ||
&& rm -rf /tmp/downloaded_packages/ /tmp/*.rds \ | ||
&& rm -rf /var/lib/apt/lists/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation.
R -e 'devtools::install_github("IRkernel/IRkernel")' && \ | ||
# or 'devtools::install_local("IRkernel-master.tar.gz")' && \ | ||
R -e 'IRkernel::installspec()' && \ | ||
# to register the kernel in the current R installation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you already have littler installed, I would suggest using installGithub.r
instead of devtools::install_github
and install2.r
instead of install.packages()
as they fail more gracefully. At the very least as devtools
is a heavy dependency use remotes::install_github()
instead which is from the same source but without all the extra non-installing functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Demo in a Dockerfile on how to use install2.r
here: https://github.com/cloudyr/googleComputeEngineR/blob/master/inst/dockerfiles/persistentRStudio/Dockerfile
R -e 'install.packages("feather")' && \ | ||
R -e 'install.packages("tensorflow")' && \ | ||
R -e 'devtools::install_github("apache/[email protected]", subdir="R/pkg")' | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New line at the end of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Author of googleAuthR/googleCloudStorageR/bigQueryR
here - cool these can be used :) I would also however suggest that bigrquery
is installed as well as that is the more popular R BigQuery client, having deeper integration with popular tools like the tidyverse
. bigQueryR
does carry features such as uploading and authentication that will be useful though, I imagine it would be best for R users to have the option of both. e.g add install.packages("bigrquery")
, or if following above suggestion on using install2.r
:
RUN install2.r --error \
-r 'http://cran.rstudio.com' \
googleAuthR \
googleCloudStorageR \
bigQueryR \
bigrquery \
feather \
tensorflow
RUN R -e 'gar_sce_auth()'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would actually start from Rocker/tidyverse Dockefile to build this one since most R users nowadays work vastly with these packages.
ln -s /usr/local/lib/python2.7/dist-packages/notebook/static/custom/custom.css /datalab/nbconvert/custom.css && \ | ||
cd / | ||
|
||
## Use Debian unstable via pinning -- new style via APT::Default-Release |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the impact of this? Does this affect any subsequent apt-get installs in the container, or that the user might do? The "unstable" bit makes me wonder... is there a way to avoid this?
Not sure what are the actual conflict. |
Just dropping in now - there hasn't been any activity on this PR since January. Would be nice to see R supported on GCP. What is holding this up? |
Any news on this? Having R would be of tremendous help to analyze genomics data! |
Hi, I have added R and some packages necessary to use the Tensorflow & Spark API with R on DataLab. (Sorry for the strange comparison, I didn't find the cause for Git to show unchanged lines as changed)