Friday, January 15, 2010

A Newick parser for Python, supporting internal node labels

I just pushed a fork of Thomas Mailund's nice Newick parser for Python to bitbucket. I added support for labeled internal nodes, but probably partially broke support for bootstrap values.
>>> from newick import parse_tree
>>> t = parse_tree("((Human,Chimp)Primate,(Mouse,Rat)Rodent)Supraprimates;")
>>> print t
(('Human', 'Chimp')Primate, ('Mouse', 'Rat')Rodent)Supraprimates
>>> print t.identifier
Supraprimates

Labels: ,

Friday, March 14, 2008

Using Makefiles for jobs that run on a cluster

Makefiles are great. While you work on a project, they make it convenient to run the necessary scripts. When you come back to the project half a year later, you don't have to dig in your brain how the scripts fit together—it's all there. (More on make, and related advice.)

However, often in bioinformatics computational tasks are too big for a single CPU, so jobs are submitted to a cluster. Then, the Makefile doesn't help you much: It can't detect that jobs are running on the cluster. There is qmake, but it only works if all your parallel jobs are specified in the Makefile. I usually write my parallel scripts in a way that they can submit as many instances as necessary of themselves to the cluster via qsub.

Therefore, I went ahead and wrote a small Python wrapper script that runs the job submission script and sniffs the job ids from the output of qsub. It then waits and monitors these jobs until they are all done. Then, the execution of the Makefile can continue.

Here's an example of how to invoke the wrapper script from the Makefile:
pubchem_compound_inchi.tsv.gz:
~/src/misc/qwrap.py ${SRC_DIR}/inchikeys.py
cat ../inchikey/* | gzip > pubchem_compound_inchi.tsv.gz
You can download the code (released under a BSD License, adapted to SGE). I hope it's useful!

Addendum: Hunting around in the SGE documentation I found the "-sync" option, which, together with job arrays, probably provides the same functionality but also checks the exit status of the jobs.

Labels:

Thursday, October 11, 2007

Sending Growl notifications from Python scripts

Working in bioinformatics can be seen as an infinite loop of: think, write a script, run script, analyze data. While I work on a Mac, most of the scripts run on a Linux server, and it would be nice to know when a script is done so that I can look at the data. In order to be notified when a script finishes, I now use Growl (see picture).

I downloaded netgrowl.py and wrote a quick and dirty wrapper package around it:

#!/usr/bin/env python

from netgrowl import *
import sys

def growlNotify(title = "Script Finished", message = ""):

addr = ("10.1.104.26", GROWL_UDP_PORT)
s = socket(AF_INET,SOCK_DGRAM)
#
# p = GrowlRegistrationPacket(application="Network Demo", password="?")
# p.addNotification("Script Finished", enabled=True)
#
# s.sendto(p.payload(), addr)

if not message:
message = sys.argv[0]

p = GrowlNotificationPacket(application="Network Demo",
notification="Script Finished", title=title,
description=message, priority=1,
sticky=True, password="?")
s.sendto(p.payload(),addr)
s.close()

if __name__ == '__main__':
growlNotify()



The registration is hidden in a comment, you only need to do that the very first time. So, in my scripts, I just insert the following right before the end of the script (using a Textmate snippet to save typing).

import growlnotify
growlnotify.growlNotify()

Labels: , ,

Wednesday, August 29, 2007

Use case for decorators

Intro

Decorators were introduced in Python 2.4 but remain a somewhat obscure feature of the language. A decorator essentially is a function that you apply to another function, replacing it with the return value of the decorator function:
def d(f):
...

@d
def g():
...
This is equivalent to:
def d(f):
...

def g():
...

g = d(g)
More background here.

The use case

As part of my research project, I’m testing several ways to explain side effects of drugs. So I made a class that loads all the relevant data into memory and that contains methods to fit the data. (I thought about creating subclasses replacing the “fit” method of the base class. However, the data that’s being loaded into memory is the same for all instances of the subclasses, so I don’t want to reload it every time. Thus, I put multiple methods into the base class.)

At first, I used introspection to identify the relevant functions by name (using dir) and then calling them. This worked fine, but it doesn’t make it particularly easy to disable some of the functions. Enter decorators: I wrote a decorator that takes a function, puts it into a list (which is a global variable) and returns the function unchanged. To call every fitting function, I can just iterate over the list. If I want to disable a function, I just comment out one line. Yay!

# with the help of decorators, keep track of the functions
# we would like to use for fitting
fit_functions = []

def fit_func(f):
fit_functions.append(f)
return f

class C(object):

def __init__(self):
...

@fit_func
def fit1(self):
"""This is the first function"""
...

@fit_func
def fit2(self):
"""This is the second function"""
...

## This function is disabled by putting the decorator in a comment
# @fit_func
def fit3(self):
"""This is the third function"""
...


def main():

c = C()

for f in fit_functions:

fit = f(c)

print fit, f.__doc__

...
Another neat trick is that the docstring of each function is accessible as an attribute, so that each fit comes together with a description.

Labels: