Co-occurrence Approach to an Item Based Recommender

For a while I thought I would tackle the problem of creating an item-based recommender. Firstly I will start with a local variant before moving onto a MapReduce version. The current version of the code can be found at: http://code.msdn.microsoft.com/Co-occurrence-Approach-to-57027db7 The approach taken for the item-based recommender will be to define a co-occurrence matrix based…


Framework for .Net Hadoop MapReduce Job Submission Binary Output

To end the week I decided to make a minor change to the “Generics based Framework for .Net Hadoop MapReduce Job Submission”. I have been doing some work on creating a co-occurrence matrix for item recommendations. I was going to map the process to a MapReduce job(s), then came across the issue of how I…


Framework for .Net Hadoop MapReduce Job Submission libjars update

If you have been using the “Generics based Framework for .Net Hadoop MapReduce Job Submission” you may want to download the latest version of the code. The previous version of the code, when processing XML and Binary files, was dependent on a custom streaming JAR that contained the necessary reader classes. This was not an…


Hadoop .Net HDFS File Access (Revisited Archived)

Updated post can be found here: http://blogs.msdn.com/b/carlnol/archive/2013/02/08/hdinsight-net-hdfs-file-access.aspx Provided with the Microsoft Distribution of Hadoop, in addition to the C library, a Managed C++ solution for HDFS file access is provided. This solution enables one to consume HDFS files from within a .Net environment. The purpose of this post is first to ensure folks know about…

9

.Net Implementation of a Priority Queue (aka Heap)

I thought I would take a break for a while from Hadoop and put together an F# .Net implementation of a Priority Queue; implemented using a heap data structure. Conceptually we can think of a heap as a balanced binary tree. The tree will have a root, and each node can have up to two…


Generics based Framework for .Net Hadoop MapReduce Job Submission

Over the past month I have been working on a framework to allow composition and submission of MapReduce jobs using .Net. I have put together two previous blog posts on this, so rather than put together a third on the latest change I thought I would create a final composite post. To understand why lets…

8

.Net Hadoop MapReduce Job Framework – Revisited (Archived)

An updated version of this post can be found at: http://blogs.msdn.com/b/carlnol/archive/2012/04/29/generic-based-framework-for-net-hadoop-mapreduce-job-submission.aspx If you have been using the Framework for Composing and Submitting .Net Hadoop MapReduce Jobs you may want to download an updated version of the code: http://code.msdn.microsoft.com/Framework-for-Composing-af656ef7 The biggest change in the latest code is the modification of the serialization mechanism. Formerly data was…


Framework for Composing and Submitting .Net Hadoop MapReduce Jobs (Archived)

An updated version of this post can be found at: http://blogs.msdn.com/b/carlnol/archive/2012/04/29/generic-based-framework-for-net-hadoop-mapreduce-job-submission.aspx If you have been following my blog you will see that I have been putting together samples for writing .Net Hadoop MapReduce jobs; using Hadoop Streaming. However one thing that became apparent is that the samples could be reconstructed in a composable framework to…


Hadoop .Net HDFS File Access (Archived)

Updated post can be found here: http://blogs.msdn.com/b/carlnol/archive/2013/02/08/hdinsight-net-hdfs-file-access.aspx If you grab the latest installment of Microsoft Distribution of Hadoop you will notice, in addition to the C library, a Managed C++ solution for HDFS file access. This solution now enables one to consume HDFS files from within a .Net environment. The purpose of this post is…

12

Hadoop Streaming in F# and MapReduce (summary)

With all my recent posts around Hadoop Streaming I thought it would be useful to summarize them into a single post. The main objective of these posts was to put together a codebase to enable F# developers to write Map/Reduce libraries through a simple API. The full code posting can be found here: http://code.msdn.microsoft.com/Hadoop-Streaming-and-F-f2e76850 The…