My Approach To Reproducible Research
The goal is simple. During my research, I often need to run a lot of different workloads, plot the results and write some analysis text. My goal is to:
- find out parameters I used to generate the results. So I can answer questions “why I see an outlier in my figure?”
- find out the script that I used to generate the plot. So I can improve the figure for publication.
- rerun the whole program and get the results. So I can produce new results by changing the old one.
To do so, I use:
- R Markdown (RStudio)
Here are some guidelines for myself.
Manage all code by one github repository
Centralized management is easier. Using Git, you have all access to the history of all your code.
Never write commands in command line
If you write
./my-awesome-code parameter1 parameter2, you will never find out what
parameter2 were after two months.
Put ALL scripts to
If you put your parameters and everything in a file like
Makefile.py, you will be able to find out what you did in what day. You don’t need to remember parameter except to run
./Makefile.py. Don’t use
./Makefile.py’s arguments, for the same reason.
get_github_url.py to get plotting script
get_github_url.py snapshots the current code and put the following script to copyboard of Mac OS.
# this requires curl installed in your OS library(devtools) source_url("https://gist.github.com/junhe/1f7e41f4c2829486e46f/raw/source_private_github_file.r") source_private_github_file("doraemon", "analysis/analyzer.r", "599060f45d97538b9dffda4b54ab88d1e7eff006")
If you copy and paste the code above to R Mardown, it will source
analysis/analyzer.r in project “doraemon”, which contains the ploting script.
Use organized script
analyzer.r to plot
This template makes it easier to have reusable plotting code.
Use R Markdown to integrate plots (as code chunk) and analysis text
This is literate programming. Code and analysis are together. This is the ultimate output of the project, where you can find insights.
Put R Markdown files to Github repository
The Github repository, which will never be lost, will be the central place where you will find everything you need to reproduce the results months or years later.