Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change directory structure and have final files as "index.qmd" #6

Merged
merged 1 commit into from
Apr 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified crress/.DS_Store
Binary file not shown.
14 changes: 7 additions & 7 deletions crress/_quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,23 +30,23 @@ book:
- about.qmd
- part: sessions/session1/session1_intro.qmd
chapters:
- sessions/session1/converted/salmon/salmon.qmd
- sessions/session1/converted/whited/whited.qmd
- sessions/session1/salmon/index.qmd
- sessions/session1/whited/index.qmd
- part: sessions/session2/session2_intro.qmd
chapters:
- sessions/session2/converted/swauger/swauger.qmd
- sessions/session2/swauger/index.qmd
- part: sessions/session3/session3_intro.qmd
chapters:
- sessions/session3/converted/diego/diego.qmd
- sessions/session3/diego/index.qmd
- part: sessions/session4/session4_intro.qmd
chapters:
- sessions/session4/converted/guimaraes/guimaraes.qmd
- sessions/session4/guimaraes/index.qmd
- part: sessions/session5/session5_intro.qmd
chapters:
- sessions/session5/converted/hoynes/hoynes.qmd
- sessions/session5/hoynes/index.qmd
- part: sessions/session7/session7_intro.qmd
chapters:
- sessions/session7/converted/macdonald/macdonald.qmd
- sessions/session7/macdonald/index.qmd
format:
html:
theme:
Expand Down
344 changes: 344 additions & 0 deletions crress/sessions/session1/salmon/index.qmd

Large diffs are not rendered by default.

126 changes: 126 additions & 0 deletions crress/sessions/session1/whited/index.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
---
author:
- affiliations: Ross School of Business at the University of Michigan and Editor of
Journal of Financial Economics
name:
- Toni M. Whited
title: Comments on Reproducibility in Finance and Economics
---

## Introduction

Reproducibility is defined as obtaining consistent results using the
same data and code as the original study. Most of the discussion of
reproducibility has centered around the many obvious benefits.
Reproducible research advances knowledge for several reasons. It reduces
the risk of errors. It also makes the processes that generate results
more transparent. This second advantage has an important educational
component, as it helps disseminate not just results but processes.
However, reproducibility is not without costs. Good research procedures
consume resources both in terms of a researcher's own efforts and in
terms of the involvement of arms-length parties in actually reproducing
the research. This second cost is not just a time cost; it is pecuniary
as well.

Thus, reproducibility is a good that is costly to produce and that has
many positive externalities. Researchers internalize many of the
benefits of reproducibility, especially in terms of research
extendability and personal reputation. However, they do not internalize
any of the benefits to the research community at large. Because
reproducibility is costly, it is unlikely to be produced at a socially
optimal rate by any individual researchers. Thus, the questions are the
extent to which reproducibility should be subsidized and who should
subsidize it. Should all research be reproduced by arms-length parties,
and what are the least costly policies that facilitate reproducible
research? The rest of this note is organized around policies regarding
actual reproduction and proprietary data.

## Code, Data, and Arms-Length Reproduction

One low-cost and easily implementable set of policies that enhances the
reproducibility of research is journals' data and code disclosure
policies. In the age of inexpensive data storage and an abundance of
public repositories, the costs of these policies are small, and the
policies should be implemented. They impose some costs on researchers in
terms of organizing data and code, but well-organized data and code are
already an essential part of the research process, so these costs should
be small.

While simple to implement, this low-cost policy is not without
non-pecuniary drawbacks for journals. The code and data can be
incomplete, poorly documented, or unusable. Moreover, journal editors
have to retract articles that, after publication, cannot be reproduced.
In economics, these concerns have prompted journals to start arms-length
reproduction of results. The benefit of this policy is primarily that
authors and journals can be confident that the code submitted with an
article actually works to reproduce the results.

However, the pecuniary costs of this policy can be substantial. It is
expensive for journals to hire data editors and well-trained research
assistants, and many academic journals run on tight budgets. It is often
time-consuming for authors to comply with reproducibility requirements.
This last issue is particularly burdensome for authors who cannot afford
research assistance.

While the above issues involve costs, the following are more
fundamental. Reproducibility policies give researchers incentives to do
research that is easier to reproduce, thus restraining research
innovation that requires either large data or intense computing. Most
importantly, code that can run on data and reproduce results can still
contain errors.

These arguments imply that while individual researchers are likely to
underproduce reproducibility, it is also unlikely optimal for the
progress of science that all research be reproduced before publication.
Some papers, even those in the very best journals, rarely get read or
cited, and the benefits of reproducing these papers are small.

However, ex-ante, it is hard to know which papers will attract attention
and which will not. One solution that lies between data and code
disclosure and arms-length reproduction is verification. It is much less
expensive to verify the contents of a replication package than to do an
actual reproduction. Verification might consist of checking for the
existence of replication instructions, an execution script, or either
data or pseudo-data. This type of service could be provided by journals
or other third parties, much as copy editors fix syntax and grammar
errors before articles are submitted. At that point, reproducibility
would be left up to the academic community, with the more important
pieces of research being subject to greater scrutiny.

A final issue with reproducibility is education. In economics and
finance, students are not taught how to create reproducible research. An
improvement that would go a long way toward improving the culture
surrounding reproducibility would be to teach PhD students how to
organize research projects and to write code in such a way that others
can reproduce results easily. This type of education would lower the
costs to individual researchers of making their own research
reproducible.

## Proprietary Data

A possibly larger challenge for reproducibility than verification or
arms-length execution of code is proprietary data. A clarification is
necessary because not all types of data with restricted access are
completely secret, that is, available only to the data provider and a
researcher. For example, commercial data sets are not secret, just
costly to obtain. Similarly, administrative datasets are not secret.
They just require special permission. In contrast, proprietary data
cannot be offered to the research community at large for the purposes of
reproducing the results. So the question is whether journals should
discourage the use of this type of data or require that verifiers have
access to the data. Given the large number of studies using proprietary
data, this issue is possibly more important than the issue of running
code.

## Conclusion

In conclusion, the reproducibility of research is essential for the
advancement of science. However, it is not without costs, so blanket
statements that all research should be reproducible are not feasible.
Instead, feasible policies include those that lower the costs for others
to replicate research. Data and code disclosure is a low-cost policy
that should be implemented widely. Verification of code and data
packages is a slightly more costly option. Arms-length reproduction is a
much more costly alternative. Finally, perhaps the most important issue
that impedes reproducibility in finance and economics is the use of
proprietary data.
Loading