On July 10, in Boston, the External Research division of Microsoft Research will introduce the Microsoft Biology Initiative, resources designed to help biological scientists and programmers conduct research more efficiently and affordably. These include the first post-beta release of the Microsoft Biology Foundation (MBF), a language-neutral bioinformatics tool kit built as an extension to the Microsoft .NET Framework. In addition to a new genome assembler, performance enhancements, and other improvements, MBF builds upon the vision and goals that drove the development of the beta versions. Those included a commitment to community involvement, extensibility, cross-platform and interoperable functionality, language neutrality, and support for best practices. While there are other libraries of biological functionality available, MBF supports universally accepted standards of the bioinformatics community and implements a range of unique functionality derived from original Microsoft research. The code for MBF and supporting documents is available on CodePlex[RK1].
Like MBF itself, the audience during the 11th annual Bioinformatics Open Source Conference, held in conjunction with the 18th annual International Conference on Intelligent Systems for Molecular Biology, represents a powerful combination of technology and biology. To harness technology in support of biological discovery, MBF implements parsers for common bioinformatics file formats and algorithms used to manipulate DNA, RNA, and protein sequences. In addition, it provides a set of connectors to biologial Web services, such as the Basic Local Alignment Search Tool, as well as a utility that enables scientists to view their data within Excel easily and quickly.
From its core technology to the free availability of the code on which it is built, MBF is the result of collaboration between Microsoft Research and industrial and academic partners, with the aim of building the tools scientists need to pursue biological research. With Microsoft .NET as its base, MBF makes it easier for developers to leverage current technologies, with thousands of functions and a common code base that can be accessed and used with great flexibility.
One of the areas in which MBF is particularly valuable is the field of genomics, which has experienced tremendous advances since the human genome first was sequenced a decade ago. A full understanding of the human genome offers great potential for advances in health care. To reduce the computational complexity of reconstructing the the whole genome, MBF includes a new whole-genome-assembly algorithm, PaDeNA (Parallel de Novo Assembler). PaDeNA has the potential to reconstruct the DNA sequence of a patient rapidly from huge volumes of experimental data, the first step in using the genome in health care. While PaDeNA is provided freely as a part of MBF, it is designed to be modular and is fully documented, enabling experimental biologists and software developers to tweak the basic algorithm and add features to meet the needs of their research.
Another example of MBF at work is the research undertaken by David Heckerman, senior director of the eScience group within Microsoft Research. Heckerman, an expert in machine learning, is working on the design of HIV vaccines, which requires an understanding of how the virus evolves in each individual. The next versions of the biological applications Heckerman is developing will use functions built into MBF. Heckerman’s applications will continue to be made available for free download on CodePlex[RK2] .
In keeping with the bioinformatics community’s strong tradition of sharing expertise in support of ongoing discovery, I invite you to download it, use it for your work, and contribute your experience to the global research community.