Session 4: Analysis pipelines and IPython Notebook

A slightly more useful sqer script

Put the following in sqer/

#! /usr/bin/env python
import argparse
import screed
import sqer

def main():
   parser = argparse.ArgumentParser()
   parser.add_argument('filenames', nargs='+')

   args = parser.parse_args()

   total = 0
   for filename in args.filenames:
       records =
       for record in records:
          print len(record.sequence)

if __name__ == '__main__':


chmod +x
git add
git commit -am "added"


  1. Write a test for!

Write a little analysis pipeline

Create a directory pipeline under sqer:

mkdir pipeline

and copy in the ‘trinity-nematostella.fa.gz’ file from the training files into this directory (any FASTA/FASTQ file will do here), gunzip it, and then rename it to assembly.fa.

Now, create pipeline/Makefile containing:

all: lengths.txt

lengths.txt: assembly.fa
     ../ assembly.fa > lengths.txt

Now, when you type ‘make’, it will run your analysis pipeline. (...pretend that ‘’ takes a long time or something :)

Start up IPython Notebook

From within the pipeline directory, run:

ipython notebook --pylab=inline

Click on ‘New Notebook’. In this new notebook, enter:

data = numpy.loadtxt('lengths.txt')
hist(data, bins=100)
xlabel('Sequence lengths')
ylabel('N sequences with that length')
title('Sequence length spectrum')

and hit ‘Shift-ENTER’ to execute.


Save the notebook (File... save...)

Now, do (from within the pipeline directory):

ls -1 assembly.fa lengths.txt > .gitignore
git add Makefile .gitignore *.ipynb
git commit -am "analysis makefile and notebook"

and then:

git push origin master

Now go find the raw URL to your notebook on github, copy it, and then paste it in at:


Additional IPython resources:

Note that you can use ‘%loadpy’ in IPython Notebook to grab code from online and import it into your notebook automagically.