Introduction

Here are a few notes on starting an R package. For context, I’ve previously made a skeleton package. That may be worth looking at, at least because there’s some useful links in the README. It’s also worth looking at Hadley Wickhams’ R packages page for more advanced information (this book is also available in the GAC at /projects/resources/books/r-packages.pdf).

In this case, I’ll be doing things a tiny bit differently from the skeleton package by starting with an existing R script, and writing this vignette describing the refactoring process with a gitFlow approach to version control..

The current goal is to make a set of R functions to obtain gene definitions from an outside data source and make a table of those gene definitions in the GAC database. We’ll bundle those functions up as an R package so it’s easy to get them and reuse them, and I’ll put the code on github because that’s an easy way to distribute it and allow installation from any Internet-connected workstation. Along the way, I’ll be demonstrating a gitflow workflow for development with version control, use of github, test-driven development, and using roxygen for documenting as we develop. So there’s lots of yak shaving as we set up a lot of infrastructure and introduce new tools and approaches, but hopefully by the end, some of the motivation becomes clear and the result is satisfying enough to convince you that it’s worth starting to write code this way.

So we begin! Coming up with a name is always a hard part - I want something that is general enough, but specific enough.. It’s hopeless. So, rather than getting too hung up on it, I’ll remember that it can always be changed later. For now, I’ll call the project, repository, and R package genetable.

Basically, this is a second iteration on the skeleton package idea.

(Note to self: It may be worthwhile to make a make_GAC_package() wrapper to devtools::create() that does preliminary steps….)

Initializing an R Package

I generally put all the code I’m working on in a directory called projects under my home directory. I’ll do the same with genetable. I’m currently working to do more and more things in the Hadley Wickham way, so I’ll be using devtools.

Create Package with Devtools

Thus, to start the package, I change to my projects directory, start R, and run some commands to get devtools and initialize the package:

cd ~/projects
R
install.packages(c("devtools", "roxygen2", "testthat", "lintr"))

## or 
# source("https://bioconductor.org/biocLite.R")
# biocLite(c("devtools", "roxygen2", "testthat", "lintr"))

devtools::create("genetable")
q()

That process makes some directories. Now ~/projects/genetable exists, and looks like this:

genetable/
├── DESCRIPTION
├── NAMESPACE
├── R
└── genetable.Rproj

You can check that our new package is a real live working R package with devtools::check(), like this:

r
devtools::check("genetable")
q()

And you’ll see that it passes the check with 0 errors and 1 warning, that there’s a non-standard license. We’ll get back to that later.

Add Version Control

I want this to be under version control, so I’ll initialize a git repository:

cd ~/projects/genetable
git init

devtools::create() made a .gitignore file, but it’s pretty limited. Instead, I’ll copy one that I’m cobbling together in a gist:

r -e 'download.file("https://gist.githubusercontent.com/bheavner/4766859540677cafd0a3ed9a27fd9586/raw/", ".gitignore", "curl")'

Now, I can add my files to the repository and do my first commit:

git add .
git commit -m "First commit"

Create New Github Repository

Next, I want to have this repository on gitHub, so I need to set up a repository there to use as the origin for this repository. So, I’ll start by making a github repo for the R package under the GAC organization github account.

A couple notes on the new repository:

  • Include a description
  • I choose to make it public (the GAC doesn’t have private repositories at the moment)
  • I’ll initialize without a README, license, or .gitignore (making those will be part of the R package)

Now the genetable repository exists!

Push Local Working Copy of the Code to GitHub

To push my repository to gitHub, I need to set github as the origin, and do a push:

cd ~/projects/genetable
git remote add origin https://github.com/UW-GAC/genetable.git
git push -u origin master

You (or anyone else) can now see the first commit for this repository on GitHub. One result of this is that Now, anyone can install the package. To install a package from github, you use devtools::install_github(). So in this case, you could install our empty package with devtools::install_github("UW-GAC/genetable"). Once they exist, you can also spcify a branch if you like, such as devtools::install_github("UW-GAC/genetable", ref = "develop").

Start GitFlow

Before I make any changes to my project, I want to start using the gitflow approach of having a tagged/versioned long-running master branch (I’ll add the develop and feature branches soon).

Tag Master Branch

The gitflow diagram uses a 0.1 as the first tag, but I’ll follow the R versioning convention and use 0.0.0.9000 as my first tag. (Check out Yihui Xie’s thoughts on versioning if you’d like more.)

git tag 0.0.0.9000
git push --tags

Now you can see this commit on GitHub under the list of tags.

Add develop branch

Since we’re using GitFlow, changes to code should mostly happen via short-lived feature branches that branch off of a long-running develop branch. So, we need to make those branches so we can add code.

I’ll start managing my refactoring with gitflow by making the long-running develop branch:

git branch develop

Begin Adding Features

Now I’d like to make that license warning from devtools::check() go away, so I want to add a license to the project. I’ll do it the gitFlow way by making a new feature branch from develop for working on a first coding task - in this case, adding a license and editing the metadata in the DESCRIPTION file so it’s accurate and there are no warnings on devtools::check():

git checkout -b features/start develop

Add License and update DESCRIPTION the gitFlow way

Now I’m on the features/start branch (can confirm with git status), so I can actually start working with the code.

I need/want to add a LICENSE file - I often use the MIT license. This license is specified in the DESCRIPTION file bythe line License: MIT + file LICENSE. It also requires a file called LICENSE that looks like this:

YEAR: <Year or years when changes have been made>
COPYRIGHT HOLDER: <Name of the copyright holder>

I just create the LICENSE file and edit the DESCRIPTION in a text editor. My changes to DESCRIPTION are pretty self-explanatory, and are in the file (of course, Hadley Wickhams’ R packages page has much more detail if you’d like).

I can do a commit on the feature branch:

git add DESCRIPTION
git add LICENSE
git commit

Then, after those edits, I can run devtools::check() in r to see how my package is behaving so far.

I get

Status: OK

R CMD check results
0 errors | 0 warnings | 0 notes

So everything looks good. I’m happy to say that’s my first feature. Now I want to merge features/start back into develop to prepare for my next feature, which I’ll start from develop.

git checkout develop
git merge --no-ff features/start

Now I can safely delete the features/start branch.

git branch -d features/start

Push Develop Branch to GitHub

Currently, the GithHub repository only has the master branch, so I need to push develop to gitHub. In general, I think I’ll be following this workflow of making feature branches locally, then merging into develop and keeping the long-running develop and master branches on github.

Here’s how to create the develop branch on the GitHub repository and push my recent commits:

git push --set-upstream origin develop

Yay! Now we’ve got a working R package (that doesn’t do anything) and a gitflow version control environment to make progress in. We’re almost ready to start refactoring code - but there’s one more piece to add: tests.

Add Testing to the package

There is much to read about “Test Driven Development” - someone named Ned Batchelder gave a good presentation on TDD at PyCon2014, and someone named Jason Diamond has a great python-centric motivating example. There’s also much to read about the specifics of testing with the testthat package, which is the devtools/R approach - but I won’t get into the whys here. The current document is focused on demonstrating the how. So, I’ll start introducing this new feature of “testing” to our package by making a new feature branch:

git checkout -b features/testing develop
r
devtools::test() # add testing infrastructure - answer Yes at the prompt
q()

That makes some new directories. Now ~/projects/genetable and looks like this:

genetable/
├── DESCRIPTION
├── NAMESPACE
├── R
├── genetable.Rproj
└── tests
    ├── testthat
    └── testthat.R

And devtools::check() fails because of an error running tests: Error: No tests found for genetable. Our goal is to pass devtools::check() again now that we’ve broken our package.

Add a Trivial Test

We’ll get into more details and examples of tests later, so for now, just treat this as a copy and paste - I will grab a gist that includes a function that always returns TRUE, and a test that confirms it, and put them in the project:

r -e 'download.file("https://gist.githubusercontent.com/bheavner/bf0e1b8816692ebdfca2d7b74c165a64/raw/4b06ce1a8d108dc27c75fcc6156867b9c67132a9/return-true.R", "R/return-true.R", "curl")'
echo "" >> R/return-true.R 
r -e 'download.file("https://gist.githubusercontent.com/bheavner/aba968b52e456b903c047485158e6651/raw/b742f272bd6bd8458f4247689e7a3189aec5cde8/test-testthat.R", "tests/testthat/test-testthat.R", "curl")'
echo "" >> tests/testthat/test-testthat.R 

We can double check that our test works in R with

r
devtools::test()

and now devtools::check() passes again. We’ve finished another feature! So, commit the changes, merge the feature branch back to develop and push it to github:

git add .
git commit
# add commit message
git checkout develop
git merge --no-ff features/testing
git branch -d features/testing
git push

At this point, my git commit history on the develop branch looks like this:

git log --oneline --graph
*   9d5baca Merge branch 'features/testing' into develop
|\  
| * bd05a06 Add testing to the package
|/  
*   d0ee1a1 Merge branch 'features/start' into develop
|\  
| * 2acc20a Add MIT License and package metadata in DESCRIPTION
|/  
* 7d45d66 First commit

Add automatic Lint-ing to the package

Lint is a program that looks at your code and makes sure it conforms to some style guides and is programatically correct. It may seem like a pain, or unreasonable, but if your code passes lint with no errors or warnings, it is much more readable, clean, and looks pretty. Use it.

Here’s what happens when I run lint on the package so far:

> devtools::lint()
Linting genetable....
>

So that’s nice and anti-climactic! The package is passing with no errors or warnings - the code is clean so far. It won’t go as smoothly on the next iteration, I bet, but let’s set it up to automatically run Lint when we run tests. Fortunately, we can integrate lint with testthat. Basically, we just need to add a test. I’ll put the test in tests/testthat/test-lint.R and add lintr to the DESCRIPTION with devtools::use_package("lintr", type = "Suggests").

Add README.md

When you look at the genetable repository on gitHub, if there’s not a README file, there’s a prompt stating

Help people interested in this repository understand your project by adding a README. 

So, let’s go ahead and add a readme. Here’s the gitflow/devtools way to do that:

git checkout -b features/add_readme develop
r
devtools::use_readme_md() # could be use_readme_rmd, but I don't intend to put code in this at the moment
q()

Then edit the README.md file as desired, committing changes as you go. After things are good, run devtools::check(), commit, merge, and push develop:

r
devtools::check()
q()
git add .
git commit -m "Add a README.md file"
git checkout develop
git merge --no-ff features/add_readme
git branch -d features/add_readme
git push

And now we’re ready to begin actually writing some R code for our package!

Add Code to be Refactored

Remember that the goal of this exercise is to refactor an R script into an R package. So I’d like to put the original R script under version control, but don’t expect that it will end up in the package after we’re done. In general, the code for a package goes in the R folder of the package (Someone named Ben Best has written a nice description of what goes where in an R package). However, since the original script includes things that don’t belong in a pakage (e.g. functions like library() or things that interact with the environment like rm(list=objects())), I don’t want to put the original script in the R/ directory.

So, I’ll make a new feature branch, then copy the existing script to a new directory:

git checkout -b features/src develop
mkdir ~/projects/genetable/original/
cp /projects/topmed/qc/freeze.2a/analysts/jaind/R/make_gencodeV19_aggUnit.R ~/projects/genetable/original/

This doesn’t break devtools::check(), but does introduce a note: Non-standard file/directory found at top level. I’d rather this note go away because I’m anal that way. To do so, use devtools::use_build_ignore("original/*"). Now, devtools::check() passes with no errors, warnings, or notes, so we’ve finished another feature, and we can merge that branch back into develop and continue the work!

git add .
git commit
# add commit message
git checkout develop
git merge --no-ff features/src
git branch -d features/src
git push

Begin Refactoring

Our goal in preliminary refactoring is to begin to divide the code up a bit - both in the trivial sense of putting the correct bits in the proper places (thanks again, Ben Best, and in the more complicated sense of dividing the big script into functions (or, if we’re so inclined classes, methods, objects, and other such things). We’ll want to have goodies like unit tests and documentation for the functions so that we can be confident that our functions are behaving as expected and that we’re not breaking things if we change individual functions. It’s easiest to do that as we go, rather than trying to go back and do it later. Let’s start with the more trivial part of refactoring first - we need to move library() functions that introduce dependancies on other R packages from the script to the DESCRIPTION file.

Add Dependancies from Script library() Calls

I’ll continue what’s hopefully becoming a familiar workflow by making a new feature branch from develop for working on this first small refactoring task of moving library dependancies from library() calls in the script:

git checkout -b features/deps develop

Now I’m on the features/deps branch, so can actually start refactoring by adding the GWASTools package as a required dependancy for the genetable package. I’ll do that the devtools way in R (I could manually edit DESCRIPTION as well, but I’m trying to use devtools more):

r
source("https://bioconductor.org/biocLite.R")
biocLite("GWASTools") # because I hadn't installed it before
devtools::use_package("GWASTools")
q()

Style Note: Did you notice the Refer to functions with GWASTools::fun() message from devtool::use_package()? Practically, a person using the genetable package would likely use library("GWASTools") so they wouldn’t have to type out GWASTools::fun() for every function they want to use from GWASTools. Within packages, I like to use the longer form of GWASTools::fun() so it’s clear which external package provides a particular function.

You can check git status to see that this modified the DESCRIPTION file, and use devtools::check() to make sure that this hasn’t broken anything yet.

All looks well, so that’s another tiny feature branch completed (don’t worry, they’ll get longer). Commit the changes, then merge the features/deps branch back into develop and push it to gitHub.

git add DESCRIPTION
git commit
git checkout develop
git merge --no-ff features/deps
git push

Now I can safely delete the features/deps branch, leaving just my master and develop branches, in good shape.

git branch -d features/deps

Starting Real Refactoring

To begin real refactoring, I’ll start looking over the original script to identify portions I can pull out and make into callable functions. There’s no set standard for how big a function should be, or what should go in it. I have a bias towards short functions that don’t do much. We’ll see if that leads to a messy R/ directory or not. As you develop your approach to how big a function should be, it’s always useful to look a bit at how an experienced person has structured a pretty powerful and popular package.

The original script is a great starting point - it’s well-organized, logically laid out, and the comments are useful and clear.

The first discrete step in the script reads a tab-delimited file from a specified path, runs some sanity checks on the contents of the file, and returns a data frame containing the features of interest from the original file:

base.path<-"/projects/topmed"
gt<-read.table(file.path(base.path,"downloaded_data/Gencode/v19/gencode.v19.annotation.gtf.gz"),as.is=TRUE,sep="\t"); dim(gt)
colnames(gt)<-c("seqname","source","feature","start","end","score","strand","frame","attribute") #see http://uswest.ensembl.org/info/website/upload/gff.html#fields to figure the feilds of the gtf file

table(gt$feature,grepl("tag basic",gt$attribute))
table(gt$feature=="transcript",grepl("tag basic",gt$attribute))

# subset gtf to give a dataframe with transcripts for only the basic set of genes
gtb<-gt[gt$feature=="transcript" & grepl("tag basic",gt$attribute),]; dim(gtb)

In this case, the resulting dataframe consists of rows for which the ‘feature’ column is “transcript” and the ‘attribute’ column includes the string “tag basic”.

So I will likely want to refactor this chunk of code from the script into a function that takes a file path, a string matching the feature of interest, and a string matching the attribute of interest as arguments, and returns a data frame containing the rows from the file that match the feature and contain the attribute. I may also make some functions for the sanity checks/summary data printing as well, but likely as another feature.

I’ll make a new branch to develop the first function - import_gencode():

git checkout -b features/gen_import develop

Diversion - Two Branches

While looking at the code, I don’t see where GWASTools is actually used. So maybe we don’t need it in the package at all! I’d like to revise my approach, and only add that dependancy if I need it later. So I want to go back to my dev branch and remove the dependancy. Meanwhile, I’ll keep the features/gen_import branch, because I’m doing some different things over there.

git checkout -b features/drop_dep develop

Then I use an editor to change the DESCRIPTION file by modifying Imports: GWASTools to Imports:. Run devtools::check(), commit, merge to develop, and delete that feature branch.

git add DESCRIPTION
git commit
git checkout develop
git merge --no-ff features/drop_dep
git branch -d features/drop_dep
git push

Now my git commit history on the develop branch looks like this:

git log --oneline --graph
*   0af3270 Merge branch 'features/drop_dep' into develop
|\  
| * 156ac44 remove GWASTools dependancy
|/  
*   e6e1eb3 Merge branch 'features/add_readme' into develop
|\  
| * ee47c96 Add README.md
|/  
*   ebb71d0 Merge branch 'features/deps' into develop
|\  
| * 87db2f6 Add GWASTools dependancy to DESCRIPTION
|/  
*   50b849e Merge branch 'features/src' into develop
|\  
| * 4450f98 Add script to repository and put it in .Rbuildignore
|/  
*   9d5baca Merge branch 'features/testing' into develop
|\  
| * bd05a06 Add testing to the package
|/  
*   d0ee1a1 Merge branch 'features/start' into develop
|\  
| * 2acc20a Add MIT License and package metadata in DESCRIPTION
|/  
* 7d45d66 First commit

(note that the features/gen_import branch doesn’t appear on this, because it hasn’t been merged back into develop yet.)

Now I can go back over to features/gen_import and work on my first function again:

git checkout features/gen_import

Developing the first function

Recall that I’ve got a couple things in mind - 1) a function that takes a file path, a string matching the feature of interest, and a string matching the attribute of interest as arguments, and returns a data frame containing the rows from the file that match the feature and contain the attribute. and 2) I may also make some functions for the sanity checks/summary data printing as well, but likely as another feature.

I don’t want to forget that, so I’ll start my making 2 issues on my gitHub repository to keep track of those ideas. I just do that on the gitHub web page by clicking on the “issues” tab at the top of the repository.

My import function will be called import_gencode(), and the code will go in genetable/R/import-gencode.r.

Perhaps counterintuitively, I will begin working on the first function by writing a test for that function that I expect to fail, then fixing the function so it passes (remember Ned Batchelder’s PyCon2014 talk, and Jason Diamond’s motivating example.)

I’m going to use testthat for testing. testthat provides “expectations” via an expect_that() function that provide the basis of tests. testthat also gives some handy shortcut syntax for specific expect_that() operations, such as expect_error(). (see Hadley’s R Journal article on testthat for more).

For example, since import_gencode() doesn’t exist yet, we’d expect it to throw an error if we tried to use it. You can confirm that in r with:

r
testthat::expect_error(import_gencode(), )

And this doesn’t give any output, because there was, in fact, an error. However, suppose we want to return a specific error, “path not defined” if the path to the file isn’t given in the function call. You can (and generally should) give a specific error in the test, like this:

r
testthat::expect_error(import_gencode(), "path not defined")

In this case, the output is

Error: error$message does not match "path not defined".
Actual value: "could not find function "import_gencode""

Which means that the test doesn’t pass - the error message was different than what we want.

Functionally, testthat will look at a tests/testthat/ directory, and run all the tests it finds there. Tests are composed in r scripts using a context and the test_that() function. test_that() takes two arguments: a string describing the test, and a chunk of code containing the expect_that() test.

Remember that we made a test-testthat.R file as a trivial test when we added testing to the package earlier? Looking at that code should be more meaningful to you now:

# tests/testthat/test-testthat.R
context("test_testthat - unit tests")

test_that("test_return_true returns TRUE", {
  expect_true(return_true())
})

In short, this is a test that confirms the function return_true() actually does return TRUE. This test is run when we use devtools::test().

So, since I’m using testthat, I’ll put the new tests for the test_import() function in the same directory as test-testthat.R - in genetable/tests/testthat/test-import-gencode.r. I’ll make the function and test files edit these paths as appropriate and then start writing them:

touch ~/projects/genetable/R/import-gencode.r
touch ~/projects/genetable/tests/testthat/test-import-gencode.r

Here’s how I’d start test-import-gencode.r in the case of our example in which we want import_gencode() to abort and return an error "path not defined" if the path to the file isn’t given in the function call.

context("test_import_gencode - unit tests")

test_that("import-gencode returns an error if the file path is not defined", {
  expect_error(import_gencode(), "path not defined")
})

And, adding that test, when I run devtools::test(), it fails, as expected:

> devtools::test()
Loading genetable
Testing genetable
test_import_gencode - unit tests: 1
test_testthat - unit tests: .

Failed -------------------------------------------------------------------------
1. Failure: import-gencode returns an error if the file path is not defined (@test_import_gencode.r#4) 
error$message does not match "path not defined".
Actual value: "could not find function "import_gencode""


DONE ===========================================================================

Fix Failing Test(s)

So now we’ve got a development task - genetable::import_gencode() is failing a test! Our goal is to write the function so that it passes the test. Here’s a function that does that:

#genetable/R/import-gencode.r
import_gencode <- function(path){
  if (missing(path)) stop("path not defined")
}

(note that we can get a lot more sophisticated with argument checking, but I won’t do that now.)

For safety, I want to run lint, run the tests, and check the package.

> devtools::lint()
Linting genetable.....
> devtools::test()
Loading genetable
Testing genetable
test_import_gencode - unit tests: .
test_testthat - unit tests: .

DONE ===========================================================================
> devtools::check()
...
Status: OK

R CMD check results
0 errors | 0 warnings | 0 notes

So, everything is looking good so far! We’ve got a functioning R package that a user could install, and it includes a function. But there’s one more piece remaining before we continue to work to finish this feature - documentation.

Document with roxygen

It is painful to go back and document every function in a package if you’re starting from scratch. So it’s better to document as you go. Each time you write a function, you can write a quick description of the expected arguments and what it will return, and give an example of how to use the function. Of course, you’ll iterate and improve as you go. But start now. I’ll use devtools::document() and roxygen2. Of course there’s an Introduction to roxygen2 vignette, but basically I’ll be adding some comments prefaced with the string #'.

For example, we should document genetable::import_gencode(). Perhaps we could start by modifying genetable/R/import-gencode.r to look like this:

#' Load a gencode file. 
#'  
#' @param path Path to the gencode file. 
#' @return A data frame containing the rows from the file that match the specified feature and contain the given attribute. 
#' @examples
#' \dontrun{
#' import_gencode(path = 'path/to/file.tsv')
#' }
#' @export

import_gencode <- function(path){
  if (missing(path)) stop("path not defined")
}

Now, when we use devtools::document() documentation is produced:

> devtools::document()
Updating genetable documentation
Loading genetable
Writing NAMESPACE
Writing import_gencode.Rd

runing lint, the tests, and checking the package all still work. And now, I can learn more about genetable::import_gencode():

r
?genetable::import_gencode()

Now that is worth a commit!

git add .
git commit

Iterate

Now it’s a process of iteration: write a test that fails, update the function to pass the test, check documentation, run lint, check package. Once the feature is complete, merge back to develop, then move on to next feature.

In my case, I’ll use tools from the tidyverse package for this function, so I’ll add that dependancy in this feature branch, too (similar to how I added GWASTools before). After some work, I decided that the scope of importing and sub-selecting is too much for this one function, because summarizing might sub-select differently, and I don’t want to load the file two times. Better to just import and tidy the table to make it easier to work with. Sub-selecting will be a different function. So, I did some work with various commits to finish up the import-only function. So, here’s the commit that merges R/import-gencode.r and tests/testthat/test_import_gencode.r from the features/gen_import branch back to the develop branch - so you can see where the code is at that point.

Now that there’s a complete feature that may be useful to others, I’d like to increment the version, and push this package to the master branch. That’s up next.

Make Release Branch

My develop branch now has a tested, documented, working feature that might be useful. So I want to release that to the world by pushing it to the master branch. Following the gitflow approach, I’ll do that by making a relase branch from develop, then increment the version there, and then merge that with master and with develop. I’ll increment the tag on the master branch as a “release”:

git checkout develop
git branch release/0.0.0.9001
git checkout release/0.0.0.9001

Then I’ll edit DESCRIPTION to change the Version to 0.0.0.9001, and commit the change:

git add DESCRIPTION
git commit -m "incrementing version number for release"
git checkout master
git merge --no-ff release/0.0.0.9001
git tag 0.0.0.9001
git push
git push --tags
git checkout develop
git merge --no-ff release/0.0.0.9001
git branch -d release/0.0.0.9001
git push

Now the commit history on my master branch looks like this:

git log --oneline --graph
*   d115f0a Merge branch 'release/0.0.0.9001'
|\  
| * 22a7c4b incrementing version number for release
| * d889c1e removing GWASTools from DESCRIPTION
| *   f085135 fixing merge conflict on DESCRIPTION
| |\  
| | * 7332bd3 Trim the scope of import-gencode to return fields of interest
| | * 2c47537 Add lintr to testthat - Fixes #1
| | * fadb7cb progress on genetable::import_gencode()
| | * 42def0b Add documentation with roxygen2 for import_gencode function
| * |   0af3270 Merge branch 'features/drop_dep' into develop
| |\ \  
| | |/  
| |/|   
| | * 156ac44 remove GWASTools dependancy
| |/  
| *   e6e1eb3 Merge branch 'features/add_readme' into develop
| |\  
| | * ee47c96 Add README.md
| |/  
| *   ebb71d0 Merge branch 'features/deps' into develop
| |\  
| | * 87db2f6 Add GWASTools dependancy to DESCRIPTION
| |/  
| *   50b849e Merge branch 'features/src' into develop
| |\  
| | * 4450f98 Add script to repository and put it in .Rbuildignore
| |/  
| *   9d5baca Merge branch 'features/testing' into develop
| |\  
| | * bd05a06 Add testing to the package
| |/  
| *   d0ee1a1 Merge branch 'features/start' into develop
| |\  
|/ /  
| * 2acc20a Add MIT License and package metadata in DESCRIPTION
|/  
* 7d45d66 First commit

That’s a complete cycle from 0 to package that people can use for something useful.

The repository is available as a versioned release on gitHub.

And the package can be installed from an R session with devtools::install_github("UW-GAC/genetable").

Automate Testing with Travis

An easy next step is to use Travis to automatically run tests when code is pushed to gitHub.

There are some good resources if you’d like to dig deeper than this tutorial:

For this tutorial, I’ll use devtools::use_travis() to automatically add a minimal .travis.yml to the package, and to add .travis.yml to .Rbuildignore.

git checkout develop
git checkout -b features/travis
r
> devtools::use_travis()
* Creating `.travis.yml` from template.
* Adding `.travis.yml` to `.Rbuildignore`.
Next: 
 * Add a travis shield to your README.md:
[![Travis-CI Build Status](https://travis-ci.org/UW-GAC/genetable.svg?branch=master)](https://travis-ci.org/UW-GAC/genetable)
 * Turn on travis for your repo at https://travis-ci.org/UW-GAC/genetable

> q()

Then, follow the instructions from that use_travis() call to add the code for the badge by editing the package README to include the following at the beginning

[![Travis-CI Build Status](https://travis-ci.org/UW-GAC/genetable.svg?branch=master)](https://travis-ci.org/UW-GAC/genetable)

With that, we can run the normal checks then add our changes to the repository:

R
> devtools::document()
> devtools::lint()
> devtools::test()
> devtools::check()
> q()
git add .
git commit
git checkout develop
git merge --no-ff features/travis
git branch -d features/travis
git push

Include code coverage metrics on github

Eryk Walczak. has good instruction for situations where you’re pushing to github and manually uploading coverage reports. Note particularly that you have to copy the token to wherever you’re pushing your commits from.

On the other hand, if you want travis to just run and upload reports automagically, you don’t have to copy the token but can follow the instructions here: https://github.com/codecov/example-r and just add this to your .travis.yml file:

r_packages:
  - covr

after_success:
  - Rscript -e 'library(covr); codecov()'

Start Collaborating

A harder step (that I don’t have experience with yet) is inviting external collaborators to contribute to the package, and successfully integrating their code. Learning about pull requests is on the todo list.