Chapter 19 Debugging and Testing

Writing code is not too hard. Writing correct code, however, can be quite challenging.

It is often the case that you will write code that does not do what you expect, or that does not work at all. Trying to figure out why can be enormously frustrating and time consuming. There are some tools, however, that can mitigate that to some extent.

19.1 Debugging with print()

When you follow modular design, you will often have methods you wrote calling other methods you wrote, which call even more methods you wrote. When you get an error, you might not be sure where to look, or what is happening.

A simple debugging method (a method often reviled by professional programmers, but which is still useful for more ordinary folks) is to use print() statements in your code. For example, consider the following code:

    if ( any( is.na( rs$estimate ) ) ) {
        cat( "There are NAs in the estimates!\n" )
    }

Here we are printing something out that we suspect might be something we want to check.

There are a few methods for printing to the console. The first is print(), which takes any object and prints it out. You can print a dataframe, variable, or a string:

print( "My var is ", my_var, "\n" )
print( my_tibble )

You can use cat(), which is designed to print strings:

cat( "My var is ", my_var, "\n" )

You can use the cli package, which gives a bit of a nicer printout, and allows for easier formatting:

cli::cli_alert("My var is {my_var}")

The problem with printing is it is easy to have a lot of printout statements, and then when you run your code you get a wall of text. For simulations, it is easy to print so much that it will meaningfully slow your simulation down! You can use print() statements to help you figure out what is going on, while it is often better to use a more interactive debugging tool, such as the browser() function or stopifnot() statements. We discuss these next.

19.2 Debugging with browser()

Consider the following code taken from a simulation:

    if ( any( is.na( rs$estimate ) ) ) {
        browser()
    }

The browser() command stops your code and puts you in an interactive console where you can look at different objects in your workspace and see what is happening. Having it triggered when something bad happens (in this case when a set of estimates has an unexpected NA) can help untangle what is driving a rare event.

The interactive console allows you to look at the current state of the code, and you can type in commands to see what is going on. It is just like a normal R workspace, but if you look at the Environment, you will only see what the code has available at the time browser() was called. If you are inside a function, for example, you will only see the things passed to the function, and the variables the function has made.

The browser can be very important to, for example, check what values were passed to your function–many bugs are due to the wrong thing getting passed to some code that would otherwise work.

Once in a browser, you can type q to quit out. You can also type n to go to the next line of code. This allows you to walk through the code step by step, seeing what happens as you move along. Much of the time, RStudio will even jump to the part of your script where you paused, so you can see the code that will be run with each step.

19.3 Debugging with debug()

Another useful debugging tool is the debug() function. This function allows you to set a breakpoint in your code, so that when you call the function, it will stop at the beginning of the function and put you in the same browser discussed above. You use it like this:

debug( gen_dat )
run_simulation( some_parameters )

Now, when run_simulation() eventually calls gen_dat the script will stop, and you can see exactly what was passed to gen_dat and also then walk through gen_dat line by line to see what it does.

19.4 Protecting functions with stopifnot()

When writing functions, especially those that take a lot of parameters, it is often wise to include stopifnot() statements at the top to verify the function is getting what it expects. These are sometimes called “assert statements” and are a tool for making errors show up as early as possible. For example, look at this (fake) example of generating data with different means and variances

make_groups <- function( means, sds ) {
  Y = rnorm( length(means), mean=means, sd = sds )
  round( Y )
}

If we call it, but provide different lengths for our means and variances, nothing happens, because R simply recycles the standard deviation parameter:

make_groups( c(100,200,300,400), 
             c(1,100,10000) )
## [1]   101   204 17426   400

What is nasty about this possible error is nothing is telling you that something is wrong! You could build an entire simulation on this, not realizing that your fourth group has the variance of your first, and spend a lot of time wrestling with results that make no sense to you. You could even publish something based on a finding that depends on this error, which would eventually be quite embarrassing. Even if you know something is wrong with your simulation, it might take some time and effort to track down the origin of problem, as nothing would be alerting you to the error.

To avoid this kind of hardship, we can instead use assert statements to verify that the arguments to our function are as they should be, and, if they are not, stop the function in its tracks. We implement assert statements via the stopifnot() method:

make_groups <- function( means, sds ) {
  stopifnot( length(means) == length(sds) )
  Y = rnorm( length(means), mean=means, sd = sds )
  round( Y )
}

Now, if we call our function incorrectly, we get this:

make_groups( c(100,200,300,400), 
             c(1,100,10000) )
## Error in make_groups(c(100, 200, 300, 400), c(1, 100, 10000)): length(means) == length(sds) is not TRUE

The stopifnot() command ensures your code is getting called as you intended. Assert statements can also serve as a type of documentation as to what you expect. Consider, for example:

make_xy <- function( N, mu_x, mu_y, rho ) {
  stopifnot( -1 <= rho && rho <= 1 )
  X = mu_x + rnorm( N )
  Y = mu_y + rho * X + sqrt(1-rho^2)*rnorm(N)
  tibble(X = X, Y=Y)
}

Here we see that rho should be between -1 and 1. The assert serves as a good reminder of what the parameter is for.

Assert statements also protects you from inadvertently miss-remembering the order of your parameters when you call your function (although it is good practice to name your parameters as you pass). Consider:

a <- make_xy( 10, 2, 3, 0.75 )
b <- make_xy( 10, 0.75, 2, 3 )
## Error in make_xy(10, 0.75, 2, 3): -1 <= rho && rho <= 1 is not TRUE

That said, we should usually do this:

c <- make_xy( 10, rho = 0.75, mu_x = 2, mu_y = 3 )

19.5 Testing code

Testing your code is a good way to ensure that it does what you expect. We have seen some demonstration of testing code early on, such as when we made plots of our simulated data to see if it looked like we expected. This kind of code could be stored in the script with the functions being tested, so that you can run it again later to see if the code still works as expected, using the FALSE trick discussed in Section ??.

That sort of testing is important, but it can be hard to bring oneself to go and rerun it after making what seems like a trivial change to the core code. It can also be hard to track down the ripple effects of changing a low-level method that is used by many other methods.

This is why people developed “unit testing,” an approach to testing where you write code that you can just run whenever you want, code which runs a series of tests on your code and prints out which tests work as expected, and which do not. In R, the most common way of doing this is the testthat package.

There are two general parts to testthat: the expect_*() methods and the test_that() function.

Consider the following simple DGP to generate an X and Y variable with a given relationship:

my_DGP <- function( N, mu, beta ) {
  stopifnot( N > 0, beta <= 1 )
  dat = tibble( X = rnorm( N, mean = 0, sd = 1 ),
                Y = mu + beta * X + rnorm( N, sd = 1-beta^2 ) )
}

We can write test code as so:

library(testthat)
set.seed(44343)
test_that("my_DGP works as expected", {
  dta <- my_DGP(10, 0, 0.5)
  # Check that the output is a tibble
  expect_s3_class(dta, "tbl_df")
  
  # Check that the output has the right number of rows
  expect_equal(nrow(dta), 10)
  
  # Check that the output has the right columns
  expect_true(all(c("X", "Y") %in% colnames(dta)))
  
  # Check that the mean of Y is close to mu
  dta2 = my_DGP(1000, 2, 0.5)
  expect_equal(mean(dta2$Y), 2, tolerance = 0.1)
  
  # Check we get an error when we should
  expect_error(my_DGP(-10, 0, 0.5, 0.5) )
})
## Test passed 🥳

This code will run the tests, and if they all pass, it will print out a happy message.

If one or more of our tests fail, we will get a set of error messages that tells us what went wrong, and where things broke:

test_that("my_DGP works as expected (test 2)", {
  dta <- my_DGP(10000, 2, -2)
  expect_equal( sd( dta$X ), 1, tolerance = 0.02 )
  
  dta <- my_DGP(10000, 2, 0.5)
  expect_equal( var(dta$Y), 1, tolerance = 0.02 )
  
  M = lm( Y ~ X, data=dta )
  expect_equal( coef(M)[[2]], 0.5, tolerance = 0.02 )
} )
## ── Warning: my_DGP works as expected (test 2) ────
## NAs produced
## Backtrace:
##     ▆
##  1. ├─global my_DGP(10000, 2, -2)
##  2. │ └─tibble::tibble(...)
##  3. │   └─tibble:::tibble_quos(xs, .rows, .name_repair)
##  4. │     └─rlang::eval_tidy(xs[[j]], mask)
##  5. └─stats::rnorm(N, sd = 1 - beta^2)
## 
## ── Failure: my_DGP works as expected (test 2) ────
## var(dta$Y) not equal to 1.
## 1/1 mismatches
## [1] 0.806 - 1 == -0.194
## Error:
## ! Test failed

With unit testing, you write a bunch of these tests, each targeting some specific aspect of your code. If you put all of these tests in a file, you can run them all at once:

test_file(here::here( "code/demo_test_file.R" ) )
## 
## ══ Testing demo_test_file.R ══════════════════════
## 
## [ FAIL 0 | WARN 0 | SKIP 0 | PASS 0 ]
## [ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ]
## [ FAIL 0 | WARN 0 | SKIP 0 | PASS 2 ]
## [ FAIL 0 | WARN 0 | SKIP 0 | PASS 3 ]
## [ FAIL 0 | WARN 0 | SKIP 0 | PASS 4 ]
## [ FAIL 0 | WARN 0 | SKIP 0 | PASS 5 ]
## [ FAIL 0 | WARN 1 | SKIP 0 | PASS 5 ]
## [ FAIL 0 | WARN 1 | SKIP 0 | PASS 6 ]
## [ FAIL 1 | WARN 1 | SKIP 0 | PASS 6 ]
## [ FAIL 1 | WARN 1 | SKIP 0 | PASS 7 ]
## 
## ── Warning ('demo_test_file.R:35:3'): my_DGP works as expected (test 2) ──
## NAs produced
## Backtrace:
##     ▆
##  1. ├─my_DGP(10000, 2, -2) at demo_test_file.R:35:3
##  2. │ └─tibble::tibble(...) at demo_test_file.R:7:3
##  3. │   └─tibble:::tibble_quos(xs, .rows, .name_repair)
##  4. │     └─rlang::eval_tidy(xs[[j]], mask)
##  5. └─stats::rnorm(N, sd = 1 - beta^2)
## 
## ── Failure ('demo_test_file.R:39:3'): my_DGP works as expected (test 2) ──
## var(dta$Y) not equal to 1.
## 1/1 mismatches
## [1] 0.792 - 1 == -0.208
## 
## [ FAIL 1 | WARN 1 | SKIP 0 | PASS 7 ]

The test_file() method will then give an overall printout of all the tests made, and list which passed, which gave warnings, and which were skipped. You can also run the test from inside RStudio: when in a unit test file, you should see “Run Tests” at the top-right of the script pane–if you click on it, RStudio will start an entirely new work session, and source your file to test it. This is the best way to use these testing files.

Tests are run “from scratch” and this stand-alone work session approach means you have to load any needed libraries before running the testing code. If your testing code is testing your core code (stored in the R/ folder of your project, see Chapter 17 for further discussion), you will have to source the relevant files at the top of your test file.

For a full testing work flow, put all the testing files in a single directory, and run them all at once with test_dir(). The usual place to store testing files is in a tests/testthat/ directory inside your project. You can then have a tests/testthat.R file that runs test_dir() on the tests/testthat/ directory. The testthat package was designed for unit testing new R packages, but we are repurposing it for general projects here.

Once you have your unit testing all set up, you can work on your project, and then easily run all the unit tests to see if you broke anything. Even more important, if you are working with a collaborator, you can both run unit tests to ensure you have not broken something that someone else was counting on! Furthermore, you can use the test code as a reference for how the code should be used, and what the expected output is. For any reasonably complex project, having test code can be of enormous benefit.

In principle, if you are writing code to figure out why something is not working as expected, you should put that code in your testing folder so that you can run it again later, ensuring that any bug you fixed will stay fixed moving forward.