From b9a00b49f8c43dba66074b0f171faf67f8af2c9c Mon Sep 17 00:00:00 2001 From: Forest Gregg Date: Mon, 7 Sep 2020 21:15:03 -0400 Subject: [PATCH 1/8] comparisons for rmarkdown --- .../comparisons-with-existing-tools.md | 44 +++++++++++++++++++ 1 file changed, 44 insertions(+) create mode 100644 rmarkdown/research/comparisons-with-existing-tools.md diff --git a/rmarkdown/research/comparisons-with-existing-tools.md b/rmarkdown/research/comparisons-with-existing-tools.md new file mode 100644 index 0000000..6191c62 --- /dev/null +++ b/rmarkdown/research/comparisons-with-existing-tools.md @@ -0,0 +1,44 @@ +# Comparing rMarkdown with existing tools + +How does rMarkdown compare with existing tools in DataMade's stack or possible alternatives. + +## Pweave + +Like rMarkdown, [Pweave](http://mpastell.com/pweave/) is an implementation of [noweb](https://en.wikipedia.org/wiki/Noweb), but one that primarily targets Python instead of R. + +The main advantage of Pweave is that it is Python. + +While rMarkdown does allow for Python code chunks, there is typically some setup code and that does need to in R. With Pweave, it's all Python. + +That is really the only advantage. + +Like rMarkdown requires an additional runtime beyond standard Python. rMarkdown requires R and Pweave requires +[IPython](https://ipython.org/). + +Pweave is not actively maintained, and has not been updated +in three years. + +Pweave is missing many features compared to rMarkdown. Of greatest consequence are 1. chunk specific caching and support for 2. multiple languages, particularly SQL. + +Chunk specific caching can dramatically reduce build times which is critical in speed of development. + +Our past experience suggests that SQL will be a common language we will use in literate reports, and first class +support is very nice. + +## Jupyter Notebook + +Jupyter Notebooks overlap in functionality with rMarkdown. The main differences is that Notebooks are intended to be +an interactive exploration tools and rMarkdown is intended to be a documentation and document creation tool. + +I have not used Notebooks extensively, but three attributes +make it less attractive. + +1. While possible, it is more difficult to generate attractive documents from Notebooks. +2. The file format of Notebooks is not plain text and not natively diffable by github or gitlab, thus making PRs difficult +3. While possible, Notebooks are not primarily intended to +be scripted instead of interactive, thus making bit of mismatch with our ETL philosophy + +## Manual integration + +We can do and do generate statistics and graphs in one tool and then copy the data or graphics into Google Docs or a markdown file. Sometimes this is the appropriate approach, in +the recommendation document. \ No newline at end of file From 1db95e24a58f231f4cc1b220d22d5bd91384c6d6 Mon Sep 17 00:00:00 2001 From: Forest Gregg Date: Tue, 8 Sep 2020 09:28:51 -0400 Subject: [PATCH 2/8] discussion of editor support --- rmarkdown/research/comparisons-with-existing-tools.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/rmarkdown/research/comparisons-with-existing-tools.md b/rmarkdown/research/comparisons-with-existing-tools.md index 6191c62..a6c48be 100644 --- a/rmarkdown/research/comparisons-with-existing-tools.md +++ b/rmarkdown/research/comparisons-with-existing-tools.md @@ -18,6 +18,14 @@ Like rMarkdown requires an additional runtime beyond standard Python. rMarkdown Pweave is not actively maintained, and has not been updated in three years. +rMarkdown has better editor support than Pweave. For the following editors, rMarkdown is as good and usually better +than support for Pweave, if there any Pweave support exists. + +* [sublime](https://packagecontrol.io/packages/knitr) +* [emacs](https://ess.r-project.org/) +* [atom](http://www.goring.org/resources/atom_and_r.html) +* [vscode](https://marketplace.visualstudio.com/items?itemName=Ikuyadeu.r) + Pweave is missing many features compared to rMarkdown. Of greatest consequence are 1. chunk specific caching and support for 2. multiple languages, particularly SQL. Chunk specific caching can dramatically reduce build times which is critical in speed of development. From 80fde68550ff2389da9a2737d5f4824e93f36845 Mon Sep 17 00:00:00 2001 From: Forest Gregg Date: Fri, 25 Sep 2020 20:37:44 -0400 Subject: [PATCH 3/8] recommendation of adoption --- .../comparisons-with-existing-tools.md | 4 ++- .../research/recommendation-of-adoption.md | 32 +++++++++++++++++++ 2 files changed, 35 insertions(+), 1 deletion(-) create mode 100644 rmarkdown/research/recommendation-of-adoption.md diff --git a/rmarkdown/research/comparisons-with-existing-tools.md b/rmarkdown/research/comparisons-with-existing-tools.md index a6c48be..34efe5d 100644 --- a/rmarkdown/research/comparisons-with-existing-tools.md +++ b/rmarkdown/research/comparisons-with-existing-tools.md @@ -26,7 +26,9 @@ than support for Pweave, if there any Pweave support exists. * [atom](http://www.goring.org/resources/atom_and_r.html) * [vscode](https://marketplace.visualstudio.com/items?itemName=Ikuyadeu.r) -Pweave is missing many features compared to rMarkdown. Of greatest consequence are 1. chunk specific caching and support for 2. multiple languages, particularly SQL. +rMarkdown also has its own IDE, [RStudio](https://rstudio.com/) + +Beyond active devlopment and editor support, Pweave is missing many features compared to rMarkdown. Of greatest consequence are 1. chunk specific caching and support for 2. multiple languages, particularly SQL. Chunk specific caching can dramatically reduce build times which is critical in speed of development. diff --git a/rmarkdown/research/recommendation-of-adoption.md b/rmarkdown/research/recommendation-of-adoption.md new file mode 100644 index 0000000..5e2aa9d --- /dev/null +++ b/rmarkdown/research/recommendation-of-adoption.md @@ -0,0 +1,32 @@ +# Recommendation of Adoption + +We recommend RMarkdown for authoring literate research reports when the following conditions pertain: + +1. The report is for a client +2. When the report contains graphs or statistics. +3. When we use code to generate the graphs or statistics. If we are doing an quick analysis in Excel, because that is what a client needs, then a literate research report would not be useful approach. + +RMarkdown should be used even if it the report seems like it will be quick and lightweight. Experience tells us that it is not easy to predict when an analysis will grow in complexity or when a client may return months later to ask about a detail in a quick analysis. + +## Proof of concept and pilot + +RMarkdown has been the tool of choice for authoring reports in the Courts project. DataMade staff familiar with Pweave have picked it up quickly and journalists without a deep background in programming have also been able to use it successfully (within the RStudio environment). + +## Prerequisite Skills + +RMarkdown's interleaving of text and code adds another layer to interact with code. As such, we advise that staff not be introduced to RMarkdown until they are familiar with the programming language they will be using in the report. If the report will depend on SQL code, the developer should be familiar with how write and debug SQL code in the terminal or by writing SQL scripts. + +If something is not working within a RMarkdown file, it's very useful to be able to work on the code in familiar environment in order to narrow the possible considerations while debugging. + +Experience with the R programming language is not a prerequisite, unless that's the language that most of the analysis will be done in. + +## Maintenance outlook + +It is already DataMade's experience that literate research reports are more maintainable than alternative report authoring workflows. + +As far as RMarkdown in particular, the longterm outlook for this tool is excellent. + +1. RMarkdown is maintained by RStudio, the major commercial player in R. +2. The R community has settled on RMarkdown (and RStudio) as not just an report authoring tool, but as their notebooking tool. Any possible successor to RMarkdown will have significant pressure to be backwards compatible. +3. RMarkdown, as a file format, is very lightweight and convertible. + From 84e7a062486c85567c713cf35c6e4c49f4460ae4 Mon Sep 17 00:00:00 2001 From: Forest Gregg Date: Wed, 2 Dec 2020 10:36:39 -0500 Subject: [PATCH 4/8] Update rmarkdown/research/comparisons-with-existing-tools.md Co-authored-by: Jean Cochrane --- rmarkdown/research/comparisons-with-existing-tools.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/rmarkdown/research/comparisons-with-existing-tools.md b/rmarkdown/research/comparisons-with-existing-tools.md index 34efe5d..c995be8 100644 --- a/rmarkdown/research/comparisons-with-existing-tools.md +++ b/rmarkdown/research/comparisons-with-existing-tools.md @@ -8,7 +8,7 @@ Like rMarkdown, [Pweave](http://mpastell.com/pweave/) is an implementation of [n The main advantage of Pweave is that it is Python. -While rMarkdown does allow for Python code chunks, there is typically some setup code and that does need to in R. With Pweave, it's all Python. +While rMarkdown does allow for Python code chunks, there is typically some setup code and that does need to be done in R. With Pweave, it's all Python. That is really the only advantage. @@ -51,4 +51,4 @@ be scripted instead of interactive, thus making bit of mismatch with our ETL phi ## Manual integration We can do and do generate statistics and graphs in one tool and then copy the data or graphics into Google Docs or a markdown file. Sometimes this is the appropriate approach, in -the recommendation document. \ No newline at end of file +the recommendation document. From ebd9935cc0820fd31cc2eb75428fceb0b71f3033 Mon Sep 17 00:00:00 2001 From: Forest Gregg Date: Wed, 2 Dec 2020 10:36:57 -0500 Subject: [PATCH 5/8] Update rmarkdown/research/comparisons-with-existing-tools.md Co-authored-by: Jean Cochrane --- rmarkdown/research/comparisons-with-existing-tools.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rmarkdown/research/comparisons-with-existing-tools.md b/rmarkdown/research/comparisons-with-existing-tools.md index c995be8..22935b5 100644 --- a/rmarkdown/research/comparisons-with-existing-tools.md +++ b/rmarkdown/research/comparisons-with-existing-tools.md @@ -12,7 +12,7 @@ While rMarkdown does allow for Python code chunks, there is typically some setup That is really the only advantage. -Like rMarkdown requires an additional runtime beyond standard Python. rMarkdown requires R and Pweave requires +Like rMarkdown, Pweave requires an additional runtime beyond standard Python. rMarkdown requires R and Pweave requires [IPython](https://ipython.org/). Pweave is not actively maintained, and has not been updated From 4dd65367052a32d392ce43ea49c985cf9730f8c4 Mon Sep 17 00:00:00 2001 From: Forest Gregg Date: Wed, 2 Dec 2020 10:37:06 -0500 Subject: [PATCH 6/8] Update rmarkdown/research/comparisons-with-existing-tools.md Co-authored-by: Jean Cochrane --- rmarkdown/research/comparisons-with-existing-tools.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rmarkdown/research/comparisons-with-existing-tools.md b/rmarkdown/research/comparisons-with-existing-tools.md index 22935b5..111a841 100644 --- a/rmarkdown/research/comparisons-with-existing-tools.md +++ b/rmarkdown/research/comparisons-with-existing-tools.md @@ -50,5 +50,5 @@ be scripted instead of interactive, thus making bit of mismatch with our ETL phi ## Manual integration -We can do and do generate statistics and graphs in one tool and then copy the data or graphics into Google Docs or a markdown file. Sometimes this is the appropriate approach, in +We can do and do generate statistics and graphs in one tool and then copy the data or graphics into Google Docs or a markdown file. Sometimes this is the appropriate approach, as described in the recommendation document. From 1ce072745d5cde091e19e0216107bdc6b948d8f7 Mon Sep 17 00:00:00 2001 From: Forest Gregg Date: Wed, 2 Dec 2020 10:51:38 -0500 Subject: [PATCH 7/8] add section recommending people start with rstudio --- rmarkdown/research/recommendation-of-adoption.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/rmarkdown/research/recommendation-of-adoption.md b/rmarkdown/research/recommendation-of-adoption.md index 5e2aa9d..6ef5cb5 100644 --- a/rmarkdown/research/recommendation-of-adoption.md +++ b/rmarkdown/research/recommendation-of-adoption.md @@ -30,3 +30,6 @@ As far as RMarkdown in particular, the longterm outlook for this tool is excelle 2. The R community has settled on RMarkdown (and RStudio) as not just an report authoring tool, but as their notebooking tool. Any possible successor to RMarkdown will have significant pressure to be backwards compatible. 3. RMarkdown, as a file format, is very lightweight and convertible. +## Editors + +[RStudio](https://rstudio.com/) is an excellent IDE for RMarkdown. We recommend that people new to RMarkdown start with using RStudio. \ No newline at end of file From 6c5ec3bfbb0be89f63e7a6d6b481184fe79b14b1 Mon Sep 17 00:00:00 2001 From: Forest Gregg Date: Mon, 1 Feb 2021 12:00:55 -0500 Subject: [PATCH 8/8] added readme doc, and otherwise follow hannah's reccs --- data-analysis/README.md | 30 +++++++++++++++++++ .../comparisons-with-existing-tools.md | 0 .../research/recommendation-of-adoption.md | 0 3 files changed, 30 insertions(+) create mode 100644 data-analysis/README.md rename {rmarkdown => data-analysis}/research/comparisons-with-existing-tools.md (100%) rename {rmarkdown => data-analysis}/research/recommendation-of-adoption.md (100%) diff --git a/data-analysis/README.md b/data-analysis/README.md new file mode 100644 index 0000000..bf54a6b --- /dev/null +++ b/data-analysis/README.md @@ -0,0 +1,30 @@ +# Literate Analysis and RMarkdown + +This directory records best practices for writing literate analysis reports and using +[RMarkdown](https://rmarkdown.rstudio.com/authoring_quick_tour.html) to do it. + +Literate analysis is a style of writing documents that includes the text and the code for analysis in one document. It is a major benefit in keeping your numbers and figures +aligned with your text; consolidating your work sanely; and self-documenting the code +your analysis code. See [Hannah write up for some more depth](https://source.opennews.org/articles/black-box-be-gone-tools-human-optimized-data-analy/). + +## Contents + +- README +- [Research](./research/) + - [Comparisons with existing tools](./research/comparisons-with-existing-tools.md) + - [Recommendation of adoption](./research/recommendation-of-adoption.md) + +## When to Literate Analysis + +When you have to write code to generate figure, charts, or graphics to include in +a research report, you should write a literate analysis document. + +## How to use RMarkdown for Literate Analysis + +Look to the [Courts Transparency cookiecutter](https://github.com/datamade/cookiecutter-court-transparency) for inspiration in getting started. + +If this is your first project, we strongly recommend using [RStudio](https://rstudio.com/), which has fabulous support for RMarkdown. + +## Resources for learning + +* https://rmarkdown.rstudio.com/lesson-1.html \ No newline at end of file diff --git a/rmarkdown/research/comparisons-with-existing-tools.md b/data-analysis/research/comparisons-with-existing-tools.md similarity index 100% rename from rmarkdown/research/comparisons-with-existing-tools.md rename to data-analysis/research/comparisons-with-existing-tools.md diff --git a/rmarkdown/research/recommendation-of-adoption.md b/data-analysis/research/recommendation-of-adoption.md similarity index 100% rename from rmarkdown/research/recommendation-of-adoption.md rename to data-analysis/research/recommendation-of-adoption.md