-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
research documents for rMarkdown #111
Conversation
@jeancochrane @hancush, i think this is ready for review and discussion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These docs make a strong case for RMarkdown over our existing toolkit for data analysis. Thanks for sharing!
Reading through the docs, it strikes me that the relevant change recommended here isn't exclusively swapping RMarkdown in for Pweave but rather changing up our whole workflow and tooling for reproducible data analysis. I think that effort makes sense, but it has two implications for me:
- We'll need to plan to do a major edit of https://github.com/datamade/data-analysis-guidelines and bring it into this repo, ideally with some templates (I hear Courts has some cookiecutter templates already?)
- It would be more appropriate for these docs to live in whatever subdirectory the data analysis docs do, e.g. something like
data-analysis/
instead ofrmarkdown/
We can do 2 immediately in this PR but I think 1 will be a big task that could take several cycles to accomplish. I expect it'll be hard for you to execute alone given your capacity constraints. As part of pulling this in, we should make a plan for how that work is going to be tracked and delegated, since updated documentation is going to be key to the success of the adoption of this workflow.
rMarkdown has better editor support than Pweave. For the following editors, rMarkdown is as good and usually better | ||
than support for Pweave, if there any Pweave support exists. | ||
|
||
* [sublime](https://packagecontrol.io/packages/knitr) | ||
* [emacs](https://ess.r-project.org/) | ||
* [atom](http://www.goring.org/resources/atom_and_r.html) | ||
* [vscode](https://marketplace.visualstudio.com/items?itemName=Ikuyadeu.r) | ||
|
||
rMarkdown also has its own IDE, [RStudio](https://rstudio.com/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's great to hear that RMarkdown has such wide support. Our existing data analysis guidelines make a strong recommendation on which editor to use, though, and I've heard @hancush express the belief that RStudio is really good and we should recommend it. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added this recommendation recommendation doc.
|
||
## Proof of concept and pilot | ||
|
||
RMarkdown has been the tool of choice for authoring reports in the Courts project. DataMade staff familiar with Pweave have picked it up quickly and journalists without a deep background in programming have also been able to use it successfully (within the RStudio environment). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be great if we could link out to the relevant project here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are not going to be accessible to all staff, let alone public folks. unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pilots are useful in evaluating the tool, as well as for providing an example for future use. If we can't link to the project, could we host a clone of the cookiecutter as a basis for future analysis? It'd be ideal to add that in this repository, under docker/templates/r-markdown
or something like that.
|
||
RMarkdown's interleaving of text and code adds another layer to interact with code. As such, we advise that staff not be introduced to RMarkdown until they are familiar with the programming language they will be using in the report. If the report will depend on SQL code, the developer should be familiar with how write and debug SQL code in the terminal or by writing SQL scripts. | ||
|
||
If something is not working within a RMarkdown file, it's very useful to be able to work on the code in familiar environment in order to narrow the possible considerations while debugging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, can you drop a debugger in a Python block in an RMarkdown file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, not really.
Co-authored-by: Jean Cochrane <[email protected]>
Co-authored-by: Jean Cochrane <[email protected]>
Co-authored-by: Jean Cochrane <[email protected]>
I think this makes sense, I propose that we bring in this PR (once I resolve some of the inline comments) and then I can open an an issue on https://github.com/datamade/data-analysis-guidelines to track the changes that need to be made there. how does that sound @hancush ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, @fgregg. A couple of broad comments:
- Is it rMarkdown or RMarkdown? In either case, could you standardize throughout?
- We aren't consistent about organizing docs, e.g., some top-level directories are about tools, while others are about topic areas. As we expand this repository, I'm starting to prefer the topic area approach. Would you mind rehoming this, as Jean suggested, to a
data-analysis
directory?
Re: our existing docs (and related to a data-analysis
directory), since the majority of our data analysis docs pertain to Pweave and because that repo hasn't really grown legs in the same way data making has, I think I'd prefer to archive that repo with a pointer to how-to
and add documentation on our revised practices here. What do you think?
@@ -0,0 +1,54 @@ | |||
# Comparing rMarkdown with existing tools | |||
|
|||
How does rMarkdown compare with existing tools in DataMade's stack or possible alternatives. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does rMarkdown compare with existing tools in DataMade's stack or possible alternatives. | |
How does rMarkdown compare with existing tools in DataMade's stack or possible alternatives? |
|
||
The main advantage of Pweave is that it is Python. | ||
|
||
While rMarkdown does allow for Python code chunks, there is typically some setup code and that does need to be done in R. With Pweave, it's all Python. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you qualify the R setup a bit more, e.g., include a code block with an example setup? IIRC, it's pretty minimal, and an example could help to illuminate that.
|
||
While rMarkdown does allow for Python code chunks, there is typically some setup code and that does need to be done in R. With Pweave, it's all Python. | ||
|
||
That is really the only advantage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is really the only advantage. | |
That is really the only advantage of Pweave. |
Like rMarkdown, Pweave requires an additional runtime beyond standard Python. rMarkdown requires R and Pweave requires | ||
[IPython](https://ipython.org/). | ||
|
||
Pweave is not actively maintained, and has not been updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a link to the repo here?
in three years. | ||
|
||
rMarkdown has better editor support than Pweave. For the following editors, rMarkdown is as good and usually better | ||
than support for Pweave, if there any Pweave support exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
than support for Pweave, if there any Pweave support exists. | |
than support for Pweave, if any Pweave support exists. |
|
||
rMarkdown also has its own IDE, [RStudio](https://rstudio.com/) | ||
|
||
Beyond active devlopment and editor support, Pweave is missing many features compared to rMarkdown. Of greatest consequence are 1. chunk specific caching and support for 2. multiple languages, particularly SQL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beyond active devlopment and editor support, Pweave is missing many features compared to rMarkdown. Of greatest consequence are 1. chunk specific caching and support for 2. multiple languages, particularly SQL. | |
Beyond active development and editor support, Pweave is missing many features compared to rMarkdown. Of greatest consequence are chunk specific caching and support for multiple languages, particularly SQL. |
1. The report is for a client | ||
2. When the report contains graphs or statistics. | ||
3. When we use code to generate the graphs or statistics. If we are doing an quick analysis in Excel, because that is what a client needs, then a literate research report would not be useful approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. The report is for a client | |
2. When the report contains graphs or statistics. | |
3. When we use code to generate the graphs or statistics. If we are doing an quick analysis in Excel, because that is what a client needs, then a literate research report would not be useful approach. | |
1. The report is for a client. | |
2. The report contains graphs or statistics. | |
3. We use code to generate the graphs or statistics. If we are doing a quick analysis in Excel, because that is what a client needs, then a literate research report would not be useful approach. |
|
||
## Proof of concept and pilot | ||
|
||
RMarkdown has been the tool of choice for authoring reports in the Courts project. DataMade staff familiar with Pweave have picked it up quickly and journalists without a deep background in programming have also been able to use it successfully (within the RStudio environment). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pilots are useful in evaluating the tool, as well as for providing an example for future use. If we can't link to the project, could we host a clone of the cookiecutter as a basis for future analysis? It'd be ideal to add that in this repository, under docker/templates/r-markdown
or something like that.
Follow up Q: Does RMarkdown obviate the need for us to learn LaTeX???? |
Update: We chatted out loud at R&D. We are going to archive the data analysis guidelines and maintain our revised docs, including these artifacts, in
|
I think this responds to your requested changes? |
It's a new day at DataMade! Thank you, @fgregg. |
Overview
This PR will contains research documents for rMarkdown
Handles #21
Testing Instructions