Session I: Testing¶

For the rest of the sessions, you’ll need an account at http://github.com/ as well as an account at https://readthedocs.org/.

The ‘sqer’ Python package¶

We’re going to create a Python utility called ‘sqer’ to give read statistics.

Let’s start by creating a ‘sqer’ library.

Make a directory ‘sqer’.

Inside this make another directory ‘sqer’.

In ‘sqer/sqer’ open a file ‘__init__.py’

From within the top level directory ‘sqer’ you should be able to do

python -c "import sqer; print sqer"

Go into the ‘sqer’ directory and initialize a git repo:

git init

Add the sqer/ package directory:

git add *

git status

Note the .pyc file – this is not a source file, but rather a generated file. Let’s remove it from the commit:

git rm --cached sqer/__init__.pyc

and also ignore it from here on out:

echo '*.pyc' > .gitignore

Now:

git status

will not show it as a file, and ‘git add’ will not add it unless it’s forced.

Next,

git add .gitignore

and commit:

git commit -am "initial commit"

Now ‘git status’ should show you only untracked files, no differences.

Note

You can use ‘git log’ to get a history of commits.

Writing some code (and tests)¶

Let’s start by writing a function that computes the sum of legit DNA bases in a sequence record. The main thing you need to know here is that each sequence record will come from the screed utility, which will give us record objects with a ‘name’, ‘sequence’, optional ‘accuracy’ (for FASTQ), and optional ‘description’ (from the FASTA/FASTQ sequence name).

So, put:

def sum_bp(record):
    return len(record.sequence)

in ‘sqer/__init__.py’.

Now, let’s add a test. Create a directory ‘tests’ and put a file ‘test_basic.py’ in it; this file should contain:

import sqer

class FakeRecord(object):
    def __init__(self, sequence, name=''):
        self.sequence = sequence
        self.name = name

def test_sum_bp():
    r = FakeRecord('ATGC')
    assert sqer.sum_bp(r) == 4

Here, ‘FakeRecord’ is a stub object that lets you test your code by faking an object type solely for testing.

Now, run:

nosetests

You should see:

.
----------------------------------------------------------------------
Ran 1 test in 0.003s

OK

You can also run ‘nosetests -v’ to get more verbose output.

Tests pass? Great, add and commit it with git!

git add tests
git commit -am "initial tests"

Now, let’s add a new function, ‘sum_bp_records’, to sqer/__init__.py.

def sum_bp_records(records):
    total = 0
    for record in records:
        total += sum_bp(record)

    return total

How shall we test this? Well, all it expects is an iterable of records: add this to tests/test_basic.py:

def test_sum_bp_records():
    rl = [ FakeRecord("A"), FakeRecord("G") ]
    assert sqer.sum_bp_records(rl) == 2

Now run ‘nosetests’ again – works? No complaints?

Great, commit it with git:

git status
git commit -am "added sum_bp_records and test"

Exercises¶

Write a test to handle (and ignore) non-ACGT. (Fix the code.)
Write a test to verify that lower-case is handled. (Fix the code.)
Write a function to calculate the average length of records in a file; test it.

Writing a script¶

Let’s write something to let us use this from the command line. Put the following code in count-read-bp.py:

#! /usr/bin/env python
import argparse
import screed
import sqer

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('filenames', nargs='+')

    args = parser.parse_args()

    total = 0
    for filename in args.filenames:
        records = screed.open(filename)
        total += sqer.sum_bp_records(records)

    print '%d bp total' % total

if __name__ == '__main__':
    main()

Next, ‘chmod +x count-read-bp.py’. This makes UNIX aware that it’s an executable file.

Try running it:

./count-read-bp.py

Note the friendly error message! Note that you can use ‘-h’, too.

How do we test this??

Put:

>a
ATCG
>b
GCTA

in a file ‘reads.fa’. Then:

./count-read-bp.py reads.fa

You should see ‘8 bp total’. Great!

Commit:

git add count-read-bp.py reads.fa
git commit -am "command-line script count-read-bp, plus test data"

Check with ‘git status’. Do you have editor remainder files (like ~ files from using emacs)? Add them to .gitignore and commit the changes.

Testing command line scripts¶

Put this in a file ‘tests/test_scripts.py’:

import subprocess
import os
thisdir = os.path.dirname(__file__)
thisdir = os.path.normpath(thisdir)

sqerdir = os.path.join(thisdir, '../')
sqerdir = os.path.normpath(sqerdir)


def test_count_reads():
    scriptpath = os.path.join(sqerdir, 'count-read-bp.py')
    datapath = os.path.join(sqerdir, 'reads.fa')

    p = subprocess.Popen([scriptpath, datapath],
                         stdout=subprocess.PIPE,
                         stderr=subprocess.PIPE)
    (out, err) = p.communicate()

    assert p.returncode == 0
    assert "8 bp total" in out, out

Now run ‘nosetests’ – what does it say?

Add and commit:

git add tests/test_scripts.py
git commit -am "added test for the count-read-bp.py script"

Regression tests with command line scripts¶

Grab some data from somewhere (e.g. 25k.fq.gz from training files) and put it in test-reads.fq. You can subset the 25k.fq.gz file if you want:

gunzip -c 25k.fq.gz | head -400 > test-reads.fq

Add another test to sqer/test_scripts.py:

def test_count_reads_2():
    scriptpath = os.path.join(sqerdir, 'count-read-bp.py')
    datapath = os.path.join(sqerdir, 'test-reads.fq')
    print thisdir, sqerdir, scriptpath, datapath

    p = subprocess.Popen([scriptpath, datapath],
                         stdout=subprocess.PIPE,
                         stderr=subprocess.PIPE)
    (out, err) = p.communicate()

    assert p.returncode == 0
    assert "8 bp total" in out, out

And now run ‘nosetests’.

It should break, right? :)

Fix the last ‘assert’ code, then rerun; when it all passes, do:

git add test-reads.fq
git status

Make sure that only what you think should be there is there; then do:

git commit -am "added regression test"

Reorganize¶

Let’s put the data files under data/:

mkdir data
mv test-reads.fq data
mv reads.fa data/test-reads.fa

...now, fix the tests!

Exercises¶

Add friendly output to the script, e.g. files opened, # records processed.

Add a flag for ‘silence’:

parser.add_argument("-s", dest="silent", type=bool)

and

if args.silent: ...

Testing summary¶

Points to cover:

any functions named ‘test*’ in files named ‘test*’ are executed.
unit tests are for small bits of code;
script tests (the first one) are for testing the script API;
regression tests are for making sure behavior stays the same. (We didn’t actually count the number of bases in that file, right? We just assumed it was counting them right.)
these three types of tests are for different purposes and test different things! Which one is most useful?
‘print’ statements and the like inside the tests are captured, and only output upon error.
assert statements are the way to check things.

Slightly more advanced topics if people are interested:

what do you do about output files? (temp directories)
how do you measure if your tests are “good enough”? (code coverage)

Advanced exercises¶

Write a reservoir sampling algorithm.

An Introduction to the Nose Testing Framework

http://ivory.idyll.org/articles/nose-intro.html

Session I: Testing¶

The ‘sqer’ Python package¶

Writing some code (and tests)¶

Exercises¶

Writing a script¶

Testing command line scripts¶

Regression tests with command line scripts¶

Reorganize¶

Exercises¶

Testing summary¶

Advanced exercises¶

Table Of Contents

Previous topic

Next topic

This Page

Navigation

Session I: Testing¶

The ‘sqer’ Python package¶

Writing some code (and tests)¶

Exercises¶

Writing a script¶

Testing command line scripts¶

Regression tests with command line scripts¶

Reorganize¶

Exercises¶

Testing summary¶

Advanced exercises¶

Table Of Contents

Previous topic

Next topic

This Page

Quick search

Navigation