Scale-out computing on DevLabs


Today we’re launching several new Technical Computing (TC) projects on DevLabs.  These projects give you a chance to learn about some of the technologies being developed as part of the Technical Computing initiative, to gain early access to code, and to provide feedback for several TC-related innovative projects.

Last May, I blogged about the Technical Computing initiative at Microsoft, an initiative that’s leading to technologies which will empower the world’s most important problem solvers to best utilize computing resources. These domain specialists often either develop code themselves as a necessary aspect of their work, or they rely on other developers to build the software that makes their work possible.  The TC initiative gives those developers and domain specialists ground-breaking developer tools and infrastructure to do their best work.

The TC initiative has made some important first steps since its inception.  Visual Studio 2010 includes built-in support for developing, debugging, and tuning multi-core and manycore applications and has seen impressive adoption within a wide-variety of industries and domains.  In November, we announced Service Pack 1 for HPC Server 2008 R2, which integrates Windows Azure compute cycles, allowing massively parallel applications to easily scale from the cluster to the cloud.  And this is just the beginning. The teams involved in the TC initiative are working hard on impressive new solutions to bring all that modern and future computing has to offer to developers, domain specialists, and IT professionals alike.

Today’s new TC projects take the next steps in this journey.

TPL Dataflow – Enabling parallel and concurrent .NET applications

.NET 4 saw the introduction of the Task Parallel Library (TPL), parallel loops, concurrent data structures, Parallel LINQ (PLINQ), and more, all of which were collectively referred to as Parallel Extensions to the .NET Framework.  TPL Dataflow is a new member of that family, layering on top of tasks, concurrent collections, and more to enable the development of powerful and efficient .NET-based concurrent systems built using dataflow concepts.  The technology relies on techniques based on in-process message passing and asynchronous pipelines and is heavily inspired by the Visual C++ 2010 Asynchronous Agents Library and DevLab’s Axum language. TPL Dataflow provides solutions for buffering and processing data, building systems that need high-throughput and low-latency processing of data, and building agent/actor-based systems.  TPL Dataflow was also designed to smoothly integrate with the new asynchronous language functionality in C# and Visual Basic I previously blogged about.

Below, you can see an example of an “agent” using dataflow blocks in C# to safely, asynchronously, and efficiently process incoming requests.

Dryad – Supporting data-intensive computing applications

Pioneered in Microsoft Research, Dryad, DSC, and DryadLINQ are a set of technologies that support data-intensive computing applications on Windows HPC Server 2008 R2 Service Pack 1. These technologies enable efficient processing of large volumes of data in many types of applications, including data-mining applications, image and stream processing, and various kinds of intense scientific computations. Dryad and DSC run on the cluster to support data-intensive computing and manage data that is partitioned across the cluster, while DryadLINQ allows developers to build data- and compute-intensive .NET applications using the familiar LINQ programming model.

Here you can see the code to loading textual log data using Dryad.  That data is merged and processed on a cluster, and then the results are streamed back to the client for display.

public static IEnumerable<string> GeoIp(string logStream, string geoStream)
{
    
DistributedData<string> logLinesTable = DistributedData.OpenAsText(logStream);
    
DistributedData<string> geoIpTable = DistributedData.OpenAsText(geoStream);
 
    
// Join the two tables on the common key (IP Address)
    IEnumerable<string> joined = logLinesTable.Join(geoIpTable,
        l1 => l1.Split(
‘ ‘).First(),
        l2 => l2.Split(
‘ ‘).First(),
        (l1, l2) => l2).AsEnumerable();
 
    
return joined;
}
 
public static void Main()
{
    
// Load log and geo data into DSC
    Console.WriteLine(“Loading data”);
    
File.ReadLines(“log.txt”).AsDistributed().ExecuteAsText(“hpcdsc://localhost/Samples/log”);
    
File.ReadLines(“geo.txt”).AsDistributed().ExecuteAsText(“hpcdsc://localhost/Samples/geo”);
 
    
// Run the query
    Console.WriteLine(“Running query”);
    
IEnumerable<string> results =
        GeoIp(
“hpcdsc://localhost/Samples/log”“hpcdsc://localhost/Samples/geo”);
 
    
// Print out the results
    Console.WriteLine(“Displaying results”);

    foreach (var entry in results) Console.WriteLine(entry);

}


Sho – Putting the power of data analysis flexible prototyping in your hands

Also begun in Microsoft Research, Sho provides those who are working on technical computing workloads an interactive environment for data analysis and scientific computing.  It lets you seamlessly connect scripts written in IronPython with .NET libraries, enabling fast and flexible prototyping.  The environment includes powerful and efficient libraries for linear algebra and data visualization, both of which can be used from any .NET language, as well as a feature-rich interactive shell for rapid development.  Sho comes with packages for large-scale parallel computing (via Windows HPC Server and Windows Azure), statistics, and optimization, as well as an extensible package mechanism that makes it easy for you to create and share your own packages.

As you can see in the below screenshot, Sho provides an interactive REPL (read/execute/print loop) that allows you to write code and see results textually and graphically immediately.

Try Them Out

Our goal moving forward is to add additional Technical Computing projects in pre-beta states to DevLabs in order to get your early feedback and insight and to help drive these technologies in the right direction.  We look forward to hearing from you.

Namaste! 

Comments (14)

  1. Greg Wilson says:

    This is exciting—and it's great to see that Microsoft is supporting foundational efforts like Software Carpentry (http://software-carpentry.org) to ensure that scientists and engineers have the basic skills they need *before* they try tackling things with multi-core, cloud, and peta-scale in their names.  After all, if someone doesn't know how to drive a family car, giving them keys to a transport truck and telling them to take it out on the highway is just an invitation to crash.

  2. Jim Kennelly says:

    The parallel stuff will be great when the nvidia cuda stuff is fully mainstream.  Finally figured out why Microsoft is moving things to the cloud.  So all the MS personnel can all buy Macs and get out from under all this windows desktop crap that constantly reports "Not Responding" on the Windows 7 desktop.  The Windows server products are great (almost the opposite of microsoft early days when server stuff was junk). The code base must be such crap cause things are doing downhill fast.  Can IE get any worse?  Oh yeah, the only thing worse is the desktop help built into windows.  The only thing scaling faster than facebook is windows "Report Error To Microsoft" servers that catch the errors.  Wonder if a little bell goes off at MS everytime this happens and they think an angel is getting their wings?

  3. Josh Reuben says:

    Microsoft needs to round out its Technical Computing Initiative with a few more .NET APIs

    1) a managed wrapper for GPGPU DirectCompute  – what happened to Accelerator?

    2) A rich maths library – like NMath or ExtremeOptimization – we are constantly reimplementing Vector and Matrix classes

    3) An extensible managed Data Mining model API – to create SQL Server Analysis Services algorithms requires in depth COM dev

    4) a machine learning API – like Encog – what happened to MS research Infer.NET?

    5) a numerical analysis API

  4. What says:

    @Jim – shut the hell up

  5. Nathan Brixius says:

    Jim, I encourage you to try out Sho. It includes vector and matrix libraries as well as libraries for statistics and machine learning. It also includes Solver Foundation, a .Net based library for optimization. I just posted about Sho and Solver Foundation on my blog: blogs.msdn.com/…/optimization-modeling-using-solver-foundation-and-sho.aspx.

    Best regards, Nathan

  6. Nathan Brixius says:

    [Sorry, that last post was in response to Josh, not Jim ;)]

  7. stevetei says:

    Hi Josh,

    Thanks for your feedback! At SuperComputing10 last year we talked to customers about an incubation project we are running here in Technical Computing that adds data parallel extensions to the C++ language in order to execute code on the GPU with DirectCompute serving as the runtime. We haven't yet announced how we will bring this technology to market, but it's safe to say that the TC team is very interested in GPGPU compute! We chose native C++ as a first target for data parallel extensions for a variety of reasons but especially because most of the customers we talked to were already using C++ around these computation kernels because they needed the performance. Data parallel for managed code is something we'll look at in the future.

    Nate provided some info on SolverFoundation, which provides a rich .NET optimization library. Beyond that, we're looking closely at what we can do for TC developers in the math space but don't have any specific plans to announce.

    You can expect the SQL organization to continue to innovate in the structured data mining space, but I encourage you to take a look at Dryad LINQ, as Soma mentions above, if you're interested in analysis of very large, unstructured data sets.

    Finally, I'm not familiar with Infer.NET, but it looks like it's still active on MSR's web site: research.microsoft.com/…/infernet

    All the best,

    steve

  8. Achiko Mezvrishvili says:

    Great and very interesting science computing with .Net technology especially with C# , cool. 🙂

  9. Michael Gautier says:

    Over the last year I have read news reports about the efforts your company has pursued in mobile, cloud, and your other enterprises. According to the news media, your momentum is much less than others. Many doubt you will achieve significant gains in various areas that are the focus of the consumer technology trade press. I would like to offer some ideas.

    You do make substantial investments in technology. Yet, an observer may think little of this investment translates into actual technologies available to businesses and private persons. As one of the largest technology companies, you possess the resources to make a substantial and high impact platform shift.

    What do I see as the problems Microsoft has as a company regarding technology?

    1. Old technology. Maturity is very valuable and is in no way a liability except when it diminishes progress. Still, you are successful with businesses due to the familiarity with your technology. The overall market seems to be changing. Familiarity may loose currency for a time.

    Solution: Create separate operating system that is not backward compatible in any way with Windows but is designed for the 21st century. Goals of the new system would be security, performance, graphics, and real program isolation. Like Hyper-V, offer it for free for a time.

    2. Accumulated complexity. You have built up functionality and capability over a 30 year period. Again, this provides familiarity, but it also creates barriers in building applications, learning a vast API and managing change within the overall ecosystem. Your competitors are working hard at simplifying and consolidating system functions to benefit their ability to rework their systems to meet changing circumstances and spread improvements more widely.

    Solution: Distill the lessons of the last several decades into tangible, rational insights for building a more streamlined systems and tools?

    3. Tool density. You have numerous tools, APIs, frameworks, patterns and practices, and you have numerous management consoles with some overlap between paid suites and built-in functionality. Your commercial competitors have far fewer tools, APIs, frameworks nor do they push methodology. Two of your largest competitors each have one main framework, one main toolset, and one main implied approach to creating applications on their platform. Apple, for example uses Objective-C/C++, XCode, and native code to have application run directly, quickly, and efficiently on their system.

    Solution: Streamline your platform so that the tools to build a solution fits like a glove with your system. I suggest either a parallel version of .NET that is native code C# or a revised native API that completely supersedes earlier versions. It all can be much cleaner and code focused than it is.

    4. Pricing and Licensing. Over many years, Microsoft was the low cost way to acquire technology. That has changed substantially over the past few years. You are still more affordable than some of your competitors, but less affordable in certain areas.

    Solution: Divide your products into two segments. Enterprise and Standard versus Workgroup and below across Operating systems, office suites, databases, etc. Small businesses and private individuals would have access to Workgroup and below at zero cost. Everyone would pay for scalability starting at Enterprise and Standard. That would be easy enough to understand.

    5. Style. Substance matters most in all things technology, but there is an edge to be gained in the right presentation. I believe one of the biggest problems faced by Microsoft is a confused design ethos. When I look at the websites and software for Oracle, Apple, IBM, Google, and others, I see great consistency. When I view the Microsoft websites, I see broad variation both in content and presentation. There isn't a theme or common message the visibly permeates throughout your websites and products save Office. As tangible products that I administer, Exchange looks completely different than SQL Server and they are both at the same level from a marketing standpoint. Visual Studio 2010 looks nothing like Windows 7 and has nothing in common with IIS Management Console from a tailoring and polish standpoint. Visually, the platform and the overall technology presentation could be improved.

    Solution: Consult with a design agency like IDEO or work with companies known for design like IKEA and see what kind of new fresh, consistent design can be applied through the Microsoft Ecosystem. Imagine the heightened level of familiarity and productivity that could be gained.

    Second, innovation (as in the incremental advancement of common features) should be coordinated across the board in a way that lowers information density in learning new versions while delivering real value rather than questionable distraction. Think about all the features in SQL Server and how adopting them affects disaster recovery and migration.

    Third, take the many features you have and achieve them with far less individual check boxes and menus in a more self-contained and streamlined fashion. My favorite question that always gets me in trouble with developers, but nonetheless maintains focus is to question the actual relevant of a feature. Apply this ethos to your product and software development.

    Fourth, what is the essential goal of a product and how can the essentials be improved? A good User Interface and supporting functional infrastructure would be designed to be spot on. In the current SQL Server for example, you can lock out the domain administrator if you setup the software under the local machine account and then add the machine to the domain. Should the software naturally assume domain administrators have access unless explicitly revoked?  I suggest designing all software, in general, to the natural processes common to most users with most procedural errors designed out of the process.

    Fifth, this is related to the previous suggestions, but this is about moving parts. Can the number and types of moving parts be reduced? Reducing moving parts (config files, backup config files, registry, xml files, internal databases, etc) such as various files and getting rid of DLLs that only a single program uses, and so on. The file system then becomes less cluttered, fragmented, and you reduce vectors for malware. Do the various points between Microsoft Outlook and Exchange really need logging facilities (Synch conflicts folders in Outlook, event logs in Exchange)? Has the event log system grown to a level of information density?

    Why do I suggest all of this? Simply put, many businesses and individuals still use Microsoft technology and it would be great if the platform was far more productive and compelling than it's present form. Under that circumstance, much more could be achieved if the points above were addressed. I believe the areas I've outlined are increasingly natural to the platforms of some of your largest commercial competitors and may be embraced by others advancing Open Source. Those of us responsible for maintaining existing investments in Microsoft technology could benefit well by fundamental changes in the platform.

    michaelgautier.wordpress.com

  10. Marcello says:

    @Michael –

    Delphi design is very similar to C#, and the VCL's are like .NET assemblies. Delphi compiles native code executables. So your idea is not new. Compiling applications in native code has side effects such as the dll hell. .NET doesnt have that problem, also native executables don't have garbage collector.

  11. Marcello says:

    I always have to hit 2 times the post button to post a message to the blog. Seems to be bug.

  12. Marcello says:

    I always have to hit 2 times the post button to post a message to the blog. Seems to be bug.

  13. Marcello says:

    It would be great to have the equivalent of GCJ for .NET

  14. Adam Jones says:

    More about cloud computing: http://dcxcloud.blog.com/