Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

research documents for rMarkdown #111

Merged
merged 10 commits into from
Mar 1, 2021
54 changes: 54 additions & 0 deletions rmarkdown/research/comparisons-with-existing-tools.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Comparing rMarkdown with existing tools

How does rMarkdown compare with existing tools in DataMade's stack or possible alternatives.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
How does rMarkdown compare with existing tools in DataMade's stack or possible alternatives.
How does rMarkdown compare with existing tools in DataMade's stack or possible alternatives?


## Pweave

Like rMarkdown, [Pweave](http://mpastell.com/pweave/) is an implementation of [noweb](https://en.wikipedia.org/wiki/Noweb), but one that primarily targets Python instead of R.

The main advantage of Pweave is that it is Python.

While rMarkdown does allow for Python code chunks, there is typically some setup code and that does need to in R. With Pweave, it's all Python.
fgregg marked this conversation as resolved.
Show resolved Hide resolved

That is really the only advantage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
That is really the only advantage.
That is really the only advantage of Pweave.


Like rMarkdown requires an additional runtime beyond standard Python. rMarkdown requires R and Pweave requires
fgregg marked this conversation as resolved.
Show resolved Hide resolved
[IPython](https://ipython.org/).

Pweave is not actively maintained, and has not been updated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a link to the repo here?

in three years.

rMarkdown has better editor support than Pweave. For the following editors, rMarkdown is as good and usually better
than support for Pweave, if there any Pweave support exists.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
than support for Pweave, if there any Pweave support exists.
than support for Pweave, if any Pweave support exists.


* [sublime](https://packagecontrol.io/packages/knitr)
* [emacs](https://ess.r-project.org/)
* [atom](http://www.goring.org/resources/atom_and_r.html)
* [vscode](https://marketplace.visualstudio.com/items?itemName=Ikuyadeu.r)

rMarkdown also has its own IDE, [RStudio](https://rstudio.com/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to hear that RMarkdown has such wide support. Our existing data analysis guidelines make a strong recommendation on which editor to use, though, and I've heard @hancush express the belief that RStudio is really good and we should recommend it. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this recommendation recommendation doc.


Beyond active devlopment and editor support, Pweave is missing many features compared to rMarkdown. Of greatest consequence are 1. chunk specific caching and support for 2. multiple languages, particularly SQL.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Beyond active devlopment and editor support, Pweave is missing many features compared to rMarkdown. Of greatest consequence are 1. chunk specific caching and support for 2. multiple languages, particularly SQL.
Beyond active development and editor support, Pweave is missing many features compared to rMarkdown. Of greatest consequence are chunk specific caching and support for multiple languages, particularly SQL.


Chunk specific caching can dramatically reduce build times which is critical in speed of development.

Our past experience suggests that SQL will be a common language we will use in literate reports, and first class
support is very nice.

## Jupyter Notebook

Jupyter Notebooks overlap in functionality with rMarkdown. The main differences is that Notebooks are intended to be
an interactive exploration tools and rMarkdown is intended to be a documentation and document creation tool.

I have not used Notebooks extensively, but three attributes
make it less attractive.

1. While possible, it is more difficult to generate attractive documents from Notebooks.
2. The file format of Notebooks is not plain text and not natively diffable by github or gitlab, thus making PRs difficult
3. While possible, Notebooks are not primarily intended to
be scripted instead of interactive, thus making bit of mismatch with our ETL philosophy

## Manual integration

We can do and do generate statistics and graphs in one tool and then copy the data or graphics into Google Docs or a markdown file. Sometimes this is the appropriate approach, in
fgregg marked this conversation as resolved.
Show resolved Hide resolved
the recommendation document.
32 changes: 32 additions & 0 deletions rmarkdown/research/recommendation-of-adoption.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Recommendation of Adoption

We recommend RMarkdown for authoring literate research reports when the following conditions pertain:

1. The report is for a client
2. When the report contains graphs or statistics.
3. When we use code to generate the graphs or statistics. If we are doing an quick analysis in Excel, because that is what a client needs, then a literate research report would not be useful approach.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. The report is for a client
2. When the report contains graphs or statistics.
3. When we use code to generate the graphs or statistics. If we are doing an quick analysis in Excel, because that is what a client needs, then a literate research report would not be useful approach.
1. The report is for a client.
2. The report contains graphs or statistics.
3. We use code to generate the graphs or statistics. If we are doing a quick analysis in Excel, because that is what a client needs, then a literate research report would not be useful approach.


RMarkdown should be used even if it the report seems like it will be quick and lightweight. Experience tells us that it is not easy to predict when an analysis will grow in complexity or when a client may return months later to ask about a detail in a quick analysis.

## Proof of concept and pilot

RMarkdown has been the tool of choice for authoring reports in the Courts project. DataMade staff familiar with Pweave have picked it up quickly and journalists without a deep background in programming have also been able to use it successfully (within the RStudio environment).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great if we could link out to the relevant project here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are not going to be accessible to all staff, let alone public folks. unfortunately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pilots are useful in evaluating the tool, as well as for providing an example for future use. If we can't link to the project, could we host a clone of the cookiecutter as a basis for future analysis? It'd be ideal to add that in this repository, under docker/templates/r-markdown or something like that.


## Prerequisite Skills

RMarkdown's interleaving of text and code adds another layer to interact with code. As such, we advise that staff not be introduced to RMarkdown until they are familiar with the programming language they will be using in the report. If the report will depend on SQL code, the developer should be familiar with how write and debug SQL code in the terminal or by writing SQL scripts.

If something is not working within a RMarkdown file, it's very useful to be able to work on the code in familiar environment in order to narrow the possible considerations while debugging.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, can you drop a debugger in a Python block in an RMarkdown file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, not really.


Experience with the R programming language is not a prerequisite, unless that's the language that most of the analysis will be done in.

## Maintenance outlook

It is already DataMade's experience that literate research reports are more maintainable than alternative report authoring workflows.

As far as RMarkdown in particular, the longterm outlook for this tool is excellent.

1. RMarkdown is maintained by RStudio, the major commercial player in R.
2. The R community has settled on RMarkdown (and RStudio) as not just an report authoring tool, but as their notebooking tool. Any possible successor to RMarkdown will have significant pressure to be backwards compatible.
3. RMarkdown, as a file format, is very lightweight and convertible.