Wednesday, January 19, 2011

New repo: local NCBI taxonomy database

I added some new functionality to the taxonomy repository at Google Code, creating a fork at BitBucket. The existing Python package already makes it possible to create a local database containing the content of the NCBI taxonomy, which can then be queried for names, ranks, and lineages. I added functionality to create a Newick tree from a list of NCBI taxonomy identifiers.

2 comments:

dalloliogm said...

Cool, I worked with sqlalchemy and databases before, and it is interesting to see an application to a biological dataset. However, I am not very familiar with NCBI taxonomy... is it the taxonomy of all known species? Are there other datasets released with this system?
Can I be of help in any way... are you planning on implementing other functionalities apart from the Newick tree function?

Michael Kuhn said...

I think the NCBI Taxonomy is coupled with GenBank, so it should have all organisms for which something has been sequenced. It is, however, not a final source for the actual taxonomic tree, which is still changing (e.g. it doesn't have the "new animal phylogeny").

Currently, I have no plans to add other functionality -- I just needed to group species by the (approximate) taxonomy. But you're more than welcome to fork the code. ;-)