forked from hadley/r-pkgs
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsoftware-development-practices.Rmd
209 lines (151 loc) · 12.5 KB
/
software-development-practices.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
# Software development practices {#sec-sw-dev-practices}
```{r, echo = FALSE}
source("common.R")
```
In this last part of the book, we zoom back out to consider development practices that can make you more productive and raise the quality of your work.
Here we'll discuss the use of version control and continuous integration.
In @sec-lifecycle we discuss how the nature of package maintenance varies over the lifecycle of a package.
You will notice that we recommend using certain tools:
- An integrated development environment (IDE).
In @sec-workflow101-rstudio-projects we encouraged the use of the RStudio IDE for package development work.
That's what we document, since it's what we use and devtools is developed to work especially well with RStudio.
But even if it's not RStudio, we strongly recommend working with an IDE that has specific support for R and R package development.
- Version control.
We strongly recommend the use of formal version control and, at this point in time, Git is the obvious choice.
We say that based on Git's general prevalence and, specifically, its popularity within the R package ecosystem.
In @sec-sw-dev-practices-git-github, we explain why we think version control is so important.
- Hosted version control.
We strongly recommend syncing your local Git repositories to a hosted service and, at this point in time, GitHub is the or, at least, "an" obvious choice.
This is also covered in @sec-sw-dev-practices-git-github.
- Continuous integration and deployment, a.k.a. CI/CD (or even just CI).
This terminology comes from the general software engineering world and can sound somewhat grandiose or intimidating when applied to your personal R package.
All this really means is that you set up specific package development tasks to happen automatically when you push new work to your hosted repository.
Typically you'll want to run `R CMD check` and to re-build and deploy your package website.
In @sec-sw-dev-practices-ci, we show how to do this with GitHub Actions.
You might think that these pro-style tools are overkill for someone who doesn't do software development for a living.
While we don't recommend forcing yourself to do everything above on day one of your first "hello world" project, we actually do believe these tools are broadly applicable for R package development.
The main reason is that these tools make it so much easier to do the right thing, e.g. to experiment, document, test, check, and collaborate.
By adopting a shared toolkit, part-time and newer package developers gain access to the same workflows used by experts.
This requires a certain amount of faith and upfront investment, but we believe this pays off.
## Git and GitHub {#sec-sw-dev-practices-git-github}
[Git](https://git-scm.com) is a version control system that was originally built to coordinate the work of a global group of developers working on the Linux kernel.
Git manages the evolution of a set of files --- called a repository --- in a highly structured way and we recommend that every R package should also be a Git repository (and also, probably, an RStudio Project; @sec-workflow101-rstudio-projects).
A solo developer, working on a single computer, will benefit from adopting version control.
But, for most of us, that benefit is not nearly large enough to make up for the pain of installing and using Git.
In our opinion, for most folks, the pros of Git only outweigh the cons once you take the additional step of hooking your local repository up to a remote host like [GitHub](https://github.com).
The joint use of Git and GitHub offers many benefits that more than justify the learning curve.
### Standard practice
This recommendation is well aligned with the current, general practices in software development.
Here are a few relevant facts from the [2022 Stack Overflow developer survey](https://survey.stackoverflow.co/2022/#overview), which is based on about 70K responses.
- 94% report using Git.
The second-most used version control system was SVN, used by 5% of respondents.
- For personal projects, 87% of respondents report using GitHub, followed by GitLab (21%) and Bitbucket (11%).
The ranking is the same albeit less skewed for professional work: GitHub still dominates with 56%, followed by GitLab (29%) and Bitbucket (18%).
We can even learn a bit about the habits of R package developers, based on the URLs found in the `DESCRIPTION` files of CRAN packages.
As of March 2023, there are about 19K packages on CRAN, of which about 55% have a non-empty `URL` field (over 10K).
Of those, 80% have a GitHub URL (over 8K), followed by GitLab (just over 1%) and Bitbucket (around 0.5%).
```{r}
#| eval: false
#| include: false
library(tidyverse)
db <- tools::CRAN_package_db() |>
as_tibble() |>
select(package = Package, URL)
db
db |>
count(has_URL = !is.na(URL)) |>
mutate(prop = n / sum(n))
db |>
filter(!is.na(URL)) |>
mutate(
github = str_detect(URL, "github"),
gitlab = str_detect(URL, "gitlab"),
bitbucket = str_detect(URL, "bitbucket")
) |>
count(
github, gitlab, bitbucket
) |>
mutate(prop = n / sum(n))
```
The prevalence of Git/GitHub, both within the R community and beyond, should help you feel confident that adoption will have tangible benefits.
Furthermore, the sheer popularity of these tools means there are lots of resources available for learning how to use Git and GitHub and for getting unstuck[^software-development-practices-1].
[^software-development-practices-1]: We feature GitHub here, for hosted version control, because it's what we use and what has the best support in devtools.
However, all the big-picture principles and even some details hold up for alternative platforms, such as Gitlab and Bitbucket.
Two specific resources that address the intersection of Git/GitHub and the R world are the website [Happy Git and GitHub for the useR](https://happygitwithr.com/index.html) and the article "Excuse me, do you have a moment to talk about version control?" [@bryan2018-tas].
We conclude this section with a few examples of why Git/GitHub can be valuable specifically for R package development:
- Communication with users: GitHub Issues are well-suited for taking bug reports and feature requests.
Unlike email sent to the maintainer, these conversations are accessible to others and searchable.
- Collaboration: GitHub pull requests are a very low-friction way for outside contributors to help fix bugs and add features.
- Distribution: Functions like `devtools::install_github("r-lib/devtools")` and `pak::pak("r-lib/devtools")` allow people to easily install the development version of your package, based on a source repository.
More generally, anyone can install your package from any valid Git ref, such as a branch, specific SHA, pull request, or tag.
- Website: GitHub Pages is one of the easiest ways to offer a website for your package (@sec-website-deployment).
- Continuous integration: This is actually the topic of the next section, so read on for more.
## Continuous integration {#sec-sw-dev-practices-ci}
As we said in the introduction, continuous integration and deployment is commonly abbreviated as CI/CD or just CI.
For R package development, what this means in practice is:
1. You host your source package on a platform like GitHub.
The key point is that the hosted repository provides the formal structure for integrating the work of multiple contributors.
Sometimes multiple developers have permission to push (this is how tidyverse and r-lib packages are managed).
In other cases, only the primary maintainer has push permission.
In either model, external contributors can propose changes via a pull request.
2. You configure one or more development tasks to execute automatically when certain events happen in the hosted repository, such as a push or a pull request.
For example, for an R package, it's extremely valuable to configure an automatic run of `R CMD check`.
This helps you discover breakage quickly, when it's easier to diagnose and fix, and is a tremendous help for evaluating whether to accept an external contribution.
Overall, the use of hosted version control and continuous integration can make development move more smoothly and quickly.
Even for a solo developer, having `R CMD check` run remotely, possibly on a couple of different operating systems, is a mighty weapon against the dreaded "works on my machine" problem.
Especially for packages destined for CRAN, the use of CI decreases the chance of nasty surprises right before release.
### GitHub Actions {#sec-sw-dev-practices-gha}
The easiest way to start using CI is to host your package on GitHub and use its companion service, GitHub Actions (GHA).
Then you can use various functions from usethis to configure so-called GHA workflows.
usethis copies workflow configuration files from [`r-lib/actions`](https://github.com/r-lib/actions/#readme), which is where the tidyverse team maintains GHA infrastructure useful to the R community.
### `R CMD check` via GHA
If you only use CI for one thing, it should be to run `R CMD check`.
If you call `usethis::use_github_action()` with no arguments, you can choose from a few of the most useful workflows.
Here's what that menu looks like at the time of writing:
```{r}
#| eval: false
> use_github_action()
Which action do you want to add? (0 to exit)
(See <https://github.com/r-lib/actions/tree/v2/examples> for other options)
1: check-standard: Run `R CMD check` on Linux, macOS, and Windows
2: test-coverage: Compute test coverage and report to https://about.codecov.io
3: pr-commands: Add /document and /style commands for pull requests
Selection:
```
`check-standard` is highly recommended, especially for any package that is (or aspires to be) on CRAN.
It runs `R CMD check` across a few combinations of operating system and R version.
This increases your chances of quickly detecting code that relies on the idiosyncrasies of a specific platform, while it's still easy to make the code more portable.
After making that selection, you will see some messages along these lines:
``` r
#> ✔ Creating '.github/'
#> ✔ Adding '*.html' to '.github/.gitignore'
#> ✔ Creating '.github/workflows/'
#> ✔ Saving 'r-lib/actions/examples/check-standard.yaml@v2' to .github/workflows/R-CMD-check.yaml'
#> • Learn more at <https://github.com/r-lib/actions/blob/v2/examples/README.md>.
#> ✔ Adding R-CMD-check badge to 'README.md'
```
The key things that happen here are:
- A new GHA workflow file is written to `.github/workflows/R-CMD-check.yaml`.
GHA workflows are specified via YAML files.
The message reveals the source of the YAML and gives a link to learn more.
- Some helpful additions may be made to various "ignore" files.
- A badge reporting the `R CMD check` result is added to your README, if it has been created with usethis and has an identifiable badge "parking area".
Otherwise, you'll be given some text you can copy and paste.
Commit these file changes and push to GitHub.
If you visit the "Actions" section of your repository, you should see that a GHA workflow run has been launched.
In due course, its success (or failure) will be reported there, in your README badge, and in your GitHub notifications (depending on your personal settings).
Congratulations!
Your package will now benefit from even more regular checks.
### Other uses for GHA
As suggested by the interactive menu, `usethis::use_github_action()` gives you access to pre-made workflows other than `R CMD check`.
In addition to the featured choices, you can use it to configure any of the example workflows in [`r-lib/actions`](https://github.com/r-lib/actions/tree/v2-branch/examples#readme) by passing the workflow's name.
For example:
- `use_github_action("test-coverage")` configures a workflow to track the test coverage of your package, as described in @sec-testing-design-coverage.
Since GHA allows you to run arbitrary code, there are many other things that you can use it for:
- Building your package's website and deploying the rendered site to GitHub Pages, as described in @sec-website-deployment.
See also `?usethis::use_pkgdown_github_pages()`.
- Re-publishing a book website every time you make a change to the source.
(Like we do for this book!).
If the example workflows don't cover your exact use case, you can also develop your own workflow.
Even in this case, the example workflows are often useful as inspiration.
The [`r-lib/actions`](https://github.com/r-lib/actions/#readme) repository also contains important lower-level building blocks, such as actions to install R or to install all of the dependencies indicated in a `DESCRIPTION` file.