-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
research documents for rMarkdown #111
Changes from 3 commits
b9a00b4
1db95e2
80fde68
4038849
84e7a06
ebd9935
4dd6536
fabfd1e
1ce0727
6c5ec3b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,54 @@ | ||||||
# Comparing rMarkdown with existing tools | ||||||
|
||||||
How does rMarkdown compare with existing tools in DataMade's stack or possible alternatives. | ||||||
|
||||||
## Pweave | ||||||
|
||||||
Like rMarkdown, [Pweave](http://mpastell.com/pweave/) is an implementation of [noweb](https://en.wikipedia.org/wiki/Noweb), but one that primarily targets Python instead of R. | ||||||
|
||||||
The main advantage of Pweave is that it is Python. | ||||||
|
||||||
While rMarkdown does allow for Python code chunks, there is typically some setup code and that does need to in R. With Pweave, it's all Python. | ||||||
fgregg marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
That is really the only advantage. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
Like rMarkdown requires an additional runtime beyond standard Python. rMarkdown requires R and Pweave requires | ||||||
fgregg marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
[IPython](https://ipython.org/). | ||||||
|
||||||
Pweave is not actively maintained, and has not been updated | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you add a link to the repo here? |
||||||
in three years. | ||||||
|
||||||
rMarkdown has better editor support than Pweave. For the following editors, rMarkdown is as good and usually better | ||||||
than support for Pweave, if there any Pweave support exists. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
* [sublime](https://packagecontrol.io/packages/knitr) | ||||||
* [emacs](https://ess.r-project.org/) | ||||||
* [atom](http://www.goring.org/resources/atom_and_r.html) | ||||||
* [vscode](https://marketplace.visualstudio.com/items?itemName=Ikuyadeu.r) | ||||||
|
||||||
rMarkdown also has its own IDE, [RStudio](https://rstudio.com/) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's great to hear that RMarkdown has such wide support. Our existing data analysis guidelines make a strong recommendation on which editor to use, though, and I've heard @hancush express the belief that RStudio is really good and we should recommend it. What do you think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added this recommendation recommendation doc. |
||||||
|
||||||
Beyond active devlopment and editor support, Pweave is missing many features compared to rMarkdown. Of greatest consequence are 1. chunk specific caching and support for 2. multiple languages, particularly SQL. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
Chunk specific caching can dramatically reduce build times which is critical in speed of development. | ||||||
|
||||||
Our past experience suggests that SQL will be a common language we will use in literate reports, and first class | ||||||
support is very nice. | ||||||
|
||||||
## Jupyter Notebook | ||||||
|
||||||
Jupyter Notebooks overlap in functionality with rMarkdown. The main differences is that Notebooks are intended to be | ||||||
an interactive exploration tools and rMarkdown is intended to be a documentation and document creation tool. | ||||||
|
||||||
I have not used Notebooks extensively, but three attributes | ||||||
make it less attractive. | ||||||
|
||||||
1. While possible, it is more difficult to generate attractive documents from Notebooks. | ||||||
2. The file format of Notebooks is not plain text and not natively diffable by github or gitlab, thus making PRs difficult | ||||||
3. While possible, Notebooks are not primarily intended to | ||||||
be scripted instead of interactive, thus making bit of mismatch with our ETL philosophy | ||||||
|
||||||
## Manual integration | ||||||
|
||||||
We can do and do generate statistics and graphs in one tool and then copy the data or graphics into Google Docs or a markdown file. Sometimes this is the appropriate approach, in | ||||||
fgregg marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
the recommendation document. |
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,32 @@ | ||||||||||||||
# Recommendation of Adoption | ||||||||||||||
|
||||||||||||||
We recommend RMarkdown for authoring literate research reports when the following conditions pertain: | ||||||||||||||
|
||||||||||||||
1. The report is for a client | ||||||||||||||
2. When the report contains graphs or statistics. | ||||||||||||||
3. When we use code to generate the graphs or statistics. If we are doing an quick analysis in Excel, because that is what a client needs, then a literate research report would not be useful approach. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
||||||||||||||
RMarkdown should be used even if it the report seems like it will be quick and lightweight. Experience tells us that it is not easy to predict when an analysis will grow in complexity or when a client may return months later to ask about a detail in a quick analysis. | ||||||||||||||
|
||||||||||||||
## Proof of concept and pilot | ||||||||||||||
|
||||||||||||||
RMarkdown has been the tool of choice for authoring reports in the Courts project. DataMade staff familiar with Pweave have picked it up quickly and journalists without a deep background in programming have also been able to use it successfully (within the RStudio environment). | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It'd be great if we could link out to the relevant project here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. these are not going to be accessible to all staff, let alone public folks. unfortunately. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pilots are useful in evaluating the tool, as well as for providing an example for future use. If we can't link to the project, could we host a clone of the cookiecutter as a basis for future analysis? It'd be ideal to add that in this repository, under |
||||||||||||||
|
||||||||||||||
## Prerequisite Skills | ||||||||||||||
|
||||||||||||||
RMarkdown's interleaving of text and code adds another layer to interact with code. As such, we advise that staff not be introduced to RMarkdown until they are familiar with the programming language they will be using in the report. If the report will depend on SQL code, the developer should be familiar with how write and debug SQL code in the terminal or by writing SQL scripts. | ||||||||||||||
|
||||||||||||||
If something is not working within a RMarkdown file, it's very useful to be able to work on the code in familiar environment in order to narrow the possible considerations while debugging. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Out of curiosity, can you drop a debugger in a Python block in an RMarkdown file? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no, not really. |
||||||||||||||
|
||||||||||||||
Experience with the R programming language is not a prerequisite, unless that's the language that most of the analysis will be done in. | ||||||||||||||
|
||||||||||||||
## Maintenance outlook | ||||||||||||||
|
||||||||||||||
It is already DataMade's experience that literate research reports are more maintainable than alternative report authoring workflows. | ||||||||||||||
|
||||||||||||||
As far as RMarkdown in particular, the longterm outlook for this tool is excellent. | ||||||||||||||
|
||||||||||||||
1. RMarkdown is maintained by RStudio, the major commercial player in R. | ||||||||||||||
2. The R community has settled on RMarkdown (and RStudio) as not just an report authoring tool, but as their notebooking tool. Any possible successor to RMarkdown will have significant pressure to be backwards compatible. | ||||||||||||||
3. RMarkdown, as a file format, is very lightweight and convertible. | ||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.