Notes from lecture 4 -- Testing

Notes from lecture 4 – Testing

The most effective means for ensuring software quality is testing. You have all seen testing in earlier programming courses, but today we’re going to remember how it works and perhaps push beyond what you’ve seen in the past.

Here’s a (buggy) implementation of merge sort.

This notation might not be familiar; merge-sort checks to see if its input is empty; if so, it is done; if not, it splits the list into the front half and back half (via take and drop), and then merges them. More interesting is merge, which uses match*. It just means it is doing a two-way conditional on what the values of l1 and l2 are. The first case hits when l1 is the empty list (returning l2) and the second case is the same, but when l2 is empty. The third case is when both lists have at least one element, and pattern matches out the first element and the rest of the elements, binding them to hd1 and tl1 for the first list and hd2 and tl2 for the second list. Then, the last three lines of merge check to see if hd1 or hd2 is smaller (or they are equal) and recurs based on that, rebuilding the list, as appropriate.

#lang racket
(require rackunit)

(define (merge-sort l)
  (cond
    [(empty? l) '()]
    [else
     (merge (take l (/ (length l) 2))
            (drop l (/ (length l) 2)))]))

(define (merge l1 l2)
  (match* (l1 l2)
    [('() _) l2]
    [(_ '()) l1]
    [((cons hd1 tl1)
      (cons hd2 tl2))
     (cond
       [(< hd1 hd2) (cons hd1 (merge tl1 l2))]
       [(= hd1 hd2) (cons hd1 (merge tl1 tl2))]
       [(> hd1 hd2) (cons hd2 (merge l1  tl2))])]))

(check-equal? (merge-sort '()) '())
(check-equal? (merge-sort '(1 2)) '(1 2))
(check-equal? (merge-sort '(2 1)) '(1 2))
(check-equal? (merge-sort '(1 3 2 4)) '(1 2 3 4))

Although this code is definitely buggy, all of the test cases shown here pass. Stop at this point. Open up DrRacket, copy and paste this code into it, and see if you can write some additional test cases that uncover bugs in merge-sort.

Unit tests like that are very important and are the first line of defense in software reliability. But today I want to show you another technique and I want to encourage you to see if you can find ways to try it out in your own work for this class.

One of the things we can do is call merge-sort with some random inputs and then check to see if it is well-behaved. So: there are two different aspects here to tackle: being well-behaved and generating random inputs.

There are some interesting ways to tackle well-behaved. Here are some ideas:

Check to make sure that the function never crashes. This has the advantage that it applies to nearly all functions, but the disadvantage that there are many other ways to go wrong than simply crashing, so we’ll miss some bugs.
Compare this function to some other implementation of a sorting function; in this case we might use Racket’s standard library sort function
Check something specific about the function we’re testing. Sorting has two especially important properties: the output must be a permutation of the input and the output must be in sorted order.

In the interest of time, let’s just go with the second option but if you didn’t already have a second implementation of the function you wanted to test you’d have to use one of the other options.

There are multiple ways to generate random lists of numbers, but this is a way to do it with a nice distribution of list lengths and number sizes:

(define (random-lon)
  (for/list ([i (in-range (random-natural))])
    (random-natural)))

(define (random-natural)
  (cond
    [(zero? (random 10)) 0]
    [else (+ 1 (random-natural))]))

With that code in hand, we can just repeatedly call into it, looking for wrong answers:

(for ([x (in-range 100)])
  (define l (random-lon))
  (define sl (sort l <))
  (define ms (merge-sort l))
  (unless (equal? sl ms)
    (error 'bug! "\ninput:   ~s\noutput:  ~s\ncorrect: ~s\n" l ms sl)))

Give this a try and see if you can find and fix all the bugs in the code above. Word to the wise: if you get a counterexample, try running a few more times to see if you get a smaller example.

We won’t talk about it today in lecture, but there is an automatic way to, once you’ve got a buggy input, shrink it. This is super useful when debugging since you’ll get smaller inputs (which is helpful in its own right) but also because you know that no smaller input triggered the bug, which gives you a surprising amount of information.