.Net Implementation of a Priority Queue (aka Heap)

I thought I would take a break for a while from Hadoop and put together an F# .Net implementation of a Priority Queue; implemented using a heap data structure. Conceptually we can think of a heap as a balanced binary tree. The tree will have a root, and each node can have up to two…

0

Framework for Composing and Submitting .Net Hadoop MapReduce Jobs (Archived)

An updated version of this post can be found at: http://blogs.msdn.com/b/carlnol/archive/2012/04/29/generic-based-framework-for-net-hadoop-mapreduce-job-submission.aspx If you have been following my blog you will see that I have been putting together samples for writing .Net Hadoop MapReduce jobs; using Hadoop Streaming. However one thing that became apparent is that the samples could be reconstructed in a composable framework to…

0

Hadoop Streaming in F# and MapReduce (summary)

With all my recent posts around Hadoop Streaming I thought it would be useful to summarize them into a single post. The main objective of these posts was to put together a codebase to enable F# developers to write Map/Reduce libraries through a simple API. The full code posting can be found here: http://code.msdn.microsoft.com/Hadoop-Streaming-and-F-f2e76850 The…

0

FSharpChart for Valentines Day

As I have not posted anything about FSharpChart for a while I thought I would do a quick post, one that befits Valentines Day: Plotting the heart was surprisingly easy to do: [ for x in -1.1 .. 0.001 .. 1.0 do     let y1 = abs(x)+sqrt(1.0-x**2.0)     let y2 = abs(x)-sqrt(1.0-x**2.0)     yield (x, y1)     yield (x,…

1

Hadoop XML Streaming and F# MapReduce

So, to round out the Hadoop Streaming samples I thought I would put together an XML Streaming sample. As always the code can be found here: http://code.msdn.microsoft.com/Hadoop-Streaming-and-F-f2e76850 XML Streaming Reader So how does one stream in XML? If you read the Hadoop Streaming documentation you will notice the following FAQ: You can use the record…

0

A lazy evaluation of F# Seq.groupBy for sorted sequences

In doing some recent work with Hadoop I needed to process a sequence which was grouped by a projected key. Whereas the Seq.groupBy can perform this operation, the Seq.groupBy function makes no assumption on the ordering of the original sequence. As a consequence the resulting sequence is not lazily evaluated, and is thus not suitable…

0

Hadoop Binary Streaming and PDF File Inclusion

In a previous post I talked about Hadoop Binary Streaming for the processing of Microsoft Office Word documents. However, due to there popularity, I thought inclusion for support of Adobe PDF documents would  be beneficial. To this end I have updated the source code to support processing of both “.docx” and “.pdf” documents. iTextSharp To…

3

Hadoop Binary Streaming and F# MapReduce

As mentioned in my previous post Hadoop Streaming not only supports text streaming, but it also supports Binary Streaming. As such I wanted to put together a sample that supports processing Office documents; more on support for PDF in a later post. As always the code can be downloaded from: http://code.msdn.microsoft.com/Hadoop-Streaming-and-F-f2e76850 Putting together this sample…

0

MapReduce Tester: A Quick Word

In my previous post I talked a little about testing the Hadoop Streaming F# MapReduce code; but it is worth saying a few words about the tester application. The complete code for this blog post and the F# MapReduce code can be found at: http://code.msdn.microsoft.com/Hadoop-Streaming-and-F-f2e76850 As mentioned Unit Testing the individual map and Reduce functions…

0

Hadoop Streaming and F# MapReduce

And now for something completely different. As you may know Microsoft has recently announced plans for a Hadoop adoption for both Windows Server and Windows Azure. You can find out more about Hadoop and Windows Azure at Apache Hadoop-based Services for Windows Azure and Availability of Community Technology Preview (CTP) of Hadoop based Service on…

3