F# helps show we're not Neanderthals

Well, sort of :-) One of our most recent scientific users of F# is Darren Platt, head of computational genomics at the DOE Joint Genome Institute (they sequence something like 20% of the world’s DNA - here are the stats). Darren is a co-author of the recent stunning paper on Neanderthal DNA (see screen shots below), where they sequenced DNA extracted from a Neanderthal fossil and built a metagenomic library of 66,250 base pairs. Among other results this allowed the authors to estimate that "the human and Neanderthal ancestral populations split ~370,000 years ago, before the emergence of anatomically modern humans".   

I've known Darren for some time - indeed he was the first to show me some serious Python coding back in 96 or 97. After a visit to our lab 2 months ago he decided to learn F# programming, using the samples and a draft of Chapter 2 of Expert F# as a guide (new version to be posted soon!). Amongst all the useful notes on the chapter Darren had this to say:

Hey just got my viewer up and running …. It's the fastest genome assembly viewer I've ever seen and only 500 lines of F#. It's really an incredible language Don, you've done a really good job... I would actually like to start giving this to people to use (it's already incredibly useful).... it loses a bit in the screenshot, but basically with very fast and smooth zooming/scrolling it's really pleasant to use.

The more I think about our field and current best practice (perl/java ...), rapid protyping, mathematical problems, the more enthusiastic I get about this path.

My current tool for doing is a python CGI script that produces nice images but which is completely non-interactive and very slow....

Darren has sent me some screen shots of his tool in action on the Neanderthal genome assembly from the paper (note the paper was prepared without the use of this tool, since it was written before Darren took up F#). Here's his screen shots and description, which he's kindly allowed me to post on this blog:   

Typical Human GenomeOkay, I've attached the Neanderthal (the lower two screenshots below), but first a normal assembly for comparison. ... This is a typical assembly and you see that the reads are spread randomly across the field and this is actually a fairly large piece of DNA (probably 100,000 bases). The reads are sequenced in pairs a certain spacing apart to help ensure accurate assembly.

 

 

Neaderthal Fragment One
After 50,000 years the DNA fragments are understandably much shorter we are lucky to get a 70 base pair fragment. Using the method described in our paper we actually cloned these fragments and were therefore able to sequence them multiple times.

You can see on the screen shot the reads actually stack up on top of one another and you can see forward and reverse reads on the same fragment, all starting in very similar positions. This is one of several ways we can be sure it's ancient DNA. If it was modern human DNA the fragments would be much longer.

 

Neaderthal Genome Asssembly Screen Shot 2 [The third shot] is a different fragment assembled. The different fragments were covered at random depths, so they vary anywhere from a single read to many as you see above.

Many thanks to Darren for all the material above and for the many helpful comments on F#.

Text above courtesy of Darren Platt
Informatics Department Head
DOE Joint Genome Institute,
Walnut Creek