Thursday, February 4, 2010

Human Protein Atlas data for download

As I just learned in our lab's journal club, the data from the Human Protein Atlas is available for download, thanks to their recent paper in MSB. Curiosly enough, the HPA help page still states that they do not make data available as a matter of "general policy."

Labels:

Friday, January 15, 2010

A Newick parser for Python, supporting internal node labels

I just pushed a fork of Thomas Mailund's nice Newick parser for Python to bitbucket. I added support for labeled internal nodes, but probably partially broke support for bootstrap values.
>>> from newick import parse_tree
>>> t = parse_tree("((Human,Chimp)Primate,(Mouse,Rat)Rodent)Supraprimates;")
>>> print t
(('Human', 'Chimp')Primate, ('Mouse', 'Rat')Rodent)Supraprimates
>>> print t.identifier
Supraprimates

Labels: ,

Wednesday, October 21, 2009

Public Service Announcment: Evolution of the centrosome

Information on the evolution of the centrosome can be found on Wikipedia and in the scientific literature. Not elsewhere.

(I just incorporated part of a grant proposal I sent in last week on Wikipedia. I hope this doesn't lead to any plagiarism charges. ;-) )

Thursday, October 1, 2009

Rebasing in Mercurial

After I used git for my own projects for a while, we switched the development of the STRING and STITCH databases from svn to mercurial (on bitbucket). Coming from git, I found two essential things lacking: (1) automatic coloring and paging of diffs and (2) rebasing.

Problem 1 is solved by enabling the pager and using the "attend" option to specify which command should go to less. You'll also have to globally set the options "SR" for less (e.g. "setenv LESS SR").

To rebase means to take code changes that were developed in parallel and make it look like they were developed sequentially, effectively avoiding commits which have the only purpose of merging independent changes. Rebasing before pushing also avoids the problem that you can silently drop previous changes by pushing without pulling beforehand:
That is, the branch name is stored in the changeset. The flaw is that it's quite easy to have more than one branch with the same name, and it's difficult to tell when this has happened. This can cause confusion in a team where one is left wondering what changes, exactly, have made it into the "stable" branch when multiple people have reopened and merged the branch on different timelines.
The problem of the "pointless" merges is solved beautifully by the rebase extension, which is included by default in current versions of hg. I think this extension is under-advertised.

To briefly compare mercurial and git: I think git's approach is a more radical break from subversion etc. and therefore more consistent. However, it's also harder to wrap your head around, which is why we chose mercurial in the end.

For reference, here's my .hgrc.

Wednesday, September 16, 2009

Test (also: we built a funicular)

This is mainly a test to see if the image will show up in FriendFeed (through Feed-buster). But the image is interesting as well: my son and I built a funicular with Duplo bricks, modeled after the one in Dresden. :-)

Labels:

Thursday, September 3, 2009

Learning ggplot2: 2D plot with histograms for each dimension

I have two 2D distributions and want to show on a 2D plot how they are related, but I also want to show the histograms (actually, density plots in this case) for each dimension. Thanks to ggplot2 and a Learning R post, I have sort of managed to do what I want to have:

There are still two problems: The overlapping labels for the bottom-right density axis, and a tiny bit of misalignment between the left side of the graphs on the left. I think that the dot in the labels for the density pushes the plot a tiny bit to the right compared with the 2D plot. Any ideas?

Here's the code (strongly based on the afore-linked post on Learning R):


p <- qplot(data = mtcars, mpg, hp, geom = "point", colour = cyl)

p1 <- p + opts(legend.position = "none")

p2 <- ggplot(mtcars, aes(x=mpg, group=cyl, colour=cyl))
p2 <- p2 + stat_density(fill = NA, position="dodge")
p2 <- p2 + opts(legend.position = "none", axis.title.x=theme_blank(),
axis.text.x=theme_blank())

p3 <- ggplot(mtcars, aes(x=hp, group=cyl, colour=cyl))
p3 <- p3 + stat_density(fill = NA, position="dodge") + coord_flip()
p3 <- p3 + opts(legend.position = "none", axis.title.y=theme_blank(),
axis.text.y=theme_blank())

legend <- p + opts(keep= "legend_box")

## Plot Layout Setup
Layout <- grid.layout( nrow = 2, ncol = 2,
widths = unit (c(2,1), c("null", "null")),
heights = unit (c(1,2), c("null", "null"))
)
vplayout <- function (...) {
grid.newpage()
pushViewport(viewport(layout= Layout))
}
subplot <- function(x, y) viewport(layout.pos.row=x, layout.pos.col=y)

# Plotting
vplayout()
print(p1, vp=subplot(2,1))
print(p2, vp=subplot(1,1))
print(p3, vp=subplot(2,2))
print(legend, vp=subplot(1,2))

Labels: ,

Monday, July 27, 2009

One step towards writing papers in Google Wave

Google Wave's underlying technology will not only enable collaboration with other people, it also make it possible for bots to interact with what you've written. I think this is going to change the way we work. E.g., all applications which require a significant amount of typing will benefit from the statistical auto-correction provided by the Wave app Spelly. In effect, Spelly goes over the text as you're typing it and correcting the obvious mistakes, just as you would do a bit later.

In a similar vein, the proof-of-concept bot Igor is watching out for inserted references and automagically converts them to a citation and a reference list. When writing papers, I usually insert reminders: "REF Imming review", "REF PMID 16007907". If I adjust this convention a bit and provide a bit more detail, Igor can figure out by itself which paper is meant and fetch the citation. Google Wave and Igor save me the tiresome going back-and-forth between a reference manager and the editor to insert all the citation, and they remove distractions from the process of writing and editing the paper.

Of course, this is a proof of concept, so the style can't yet be customized. I further think it would be helpful to quickly look "what's inside" a particular citation. I don't know if Google Wave supports this, but it would be nice to click on a citation ("[23]") and be presented with a pop-up window showing not only infos about the article, but also links to PubMed / a DOI resolver.

Labels: ,