If you have agreed to provide us with an external reproducibility check, we kindly ask that you follow these instructions.
Do not post them to any public place. Use the resources we provide you with, or those that your institution provides you with. While we may provide you with Github or Bitbucket repositories, those are to remain private, i.e., require a login for access.
Once you have completed the task, and sent back all changes you had to make, or sent us the report from a secure system, delete all data and code when told to do so (don't do this immediately, as we may have clarifying questions).
While you can use your laptop, you should follow certain procedures to both not affect the reproducibility check nor affect your usual working environment. Notes below.
You may have been provided access to code in two different ways
::::{tab-set}
:::{tab-item} openICPSR link
You may have been provided with a link to openICPSR. If so, you should download the code and data from there.
:::
:::{tab-item} Bitbucket link
We provide you with the code in a Git repository (the data will be separate). If you have questions on how to use Git, please let us know, and we will help and guide you.
You should NOT use the file upload or download capability of the web-based interface (i.e., Github.com, Bitbucket.com).
:::
::::
You should follow instructions as closely as possible. However,
- you are not required to do extremely tedious or time-intensive manual steps.
- you are not required to "bog down" your personal laptop (see resources)
- you should lightly improve on the procedures, by following our replication procedures (below)
- you are not required to install software or packages that might mess up your own research
In addition to any setup instructions from authors, a few things to take into account
- Use the resources we provide you with, where possible
- Use the procedures to use "config.do" or similar files.
- This is always possible for Stata. See instructions.
- Similar procedures are available, to some extent, for R (use separate libraries, if possible, or the
renv
or similar packages). See instructions. - Similar procedures are possible for Python and Julia (use
environments
) - Matlab and SAS will always use whatever is installed system-wide - this is a known caveat.
Create a log file, if possible, for every run.
- For Stata, our template config.do file will handle this, if used correctly. See instructions.
- For R, use "R CMD BATCH" to run code, even when using Rstudio (use the Terminal tab). See instructions.
- For Matlab, where possible, use the command line method of launching it. See instructions. Alternatively, use "
sink
", but note that it might interfere with some programs. - For Julia and Python, we have no good solutions other than to use the command line where possible, and capture the output.
Once you have a log file, commit it.
Keep a journal of what you are doing. You should be able to point to the journal, together with a log file or screenshots, to document problems, and how you solve them.
An example:
- Downloaded code and data from openICPSR
- Added line to use
config.do
- Ran
main.do
as instructed by the author
- I used the "right-click" method on Windows
- Code stopped at the third step, looking for package
xyz
- Added the package to the relevant section in
config.do
so it would get installed, and ran the entiremain.do
again- Programs finished but no figures were output. Inspection of the code showed that they only display on-screen. Added
graph export
as PNG files at all relevant parts, then ran entiremain.do
again.
If the authors' instructions say to "view" something interactively, investigate native methods (graph export
) to capture the information. Otherwise, use screenshot to capture the information. Make a note of that, too.
All logs and outputs, but not data, should be committed to Git.
Our standard report template is included: external-REPORT.md.
- Don't forget to report your computer and software configuration
- The report should be committed to git as well. There is no need to convert it to PDF.
Reports should be committed to git (not sent by email). An email notification that all is complete is sufficient.
Please "reply-all" to the email you received.
We will confirm to you when we have exhausted all our questions, at which point we will ask you to delete all code and data.
Once you have completed the task, and committed all changes to the Git repo we provide you with, or sent us the report from a secure system, AND have answered all of our clarifying quesitons, please delete all data and code.
You can have access to all the resources we regularly use:
::::{tab-set}
:::{tab-item} General
- BioHPC compute cluster. If you do not have an account, request one. Even if you have an account, request to be added to the "lv39" lab.
- CodeOcean - Don't forget to share with "[email protected]". Instructions are here
- WholeTale
- Github Codespaces - with some caveats, talk to us
- Your laptop
:::
:::{tab-item} Cornell-affiliated
- CCSS ("CISER") compute nodes. If you do not have an account, request a Research account with "lv39" as sponsor. Do not use the restricted data access!
:::
::::