Implementing LOB Storage in Memory Optimized Tables

Introduction Memory optimized tables do not have off-row or large object (LOB) storage, and the row size is limited to 8060 bytes. Thus, storing large binary or character string values can be done in one of two ways: Split the LOB values into multiple rows Store the LOB values in a regular non-memory optimized (file…


Execution Time Based Heuristic Custom Task Scheduler

If you follow the samples for Parallel Programming with the .Net Framework, you may have come across the ParallelExtensionsExtras and the Additional TaskSchedulers. Although these samples cover a broad set of requirements I recently came across another that could be satisfied with the creation of a new custom task scheduler. In the current samples there…


Co-occurrence Approach to an Item Based Recommender

For a while I thought I would tackle the problem of creating an item-based recommender. Firstly I will start with a local variant before moving onto a MapReduce version. The current version of the code can be found at: http://code.msdn.microsoft.com/Co-occurrence-Approach-to-57027db7 The approach taken for the item-based recommender will be to define a co-occurrence matrix based…


.Net Implementation of a Priority Queue (aka Heap)

I thought I would take a break for a while from Hadoop and put together an F# .Net implementation of a Priority Queue; implemented using a heap data structure. Conceptually we can think of a heap as a balanced binary tree. The tree will have a root, and each node can have up to two…


Generic based Framework for .Net Hadoop MapReduce Job Submission

Over the past month I have been working on a framework to allow composition and submission of MapReduce jobs using .Net. I have put together two previous blog posts on this, so rather than put together a third on the latest change I thought I would create a final composite post. To understand why lets…


Framework for Composing and Submitting .Net Hadoop MapReduce Jobs

An updated version of this post can be found at: http://blogs.msdn.com/b/carlnol/archive/2012/04/29/generic-based-framework-for-net-hadoop-mapreduce-job-submission.aspx If you have been following my blog you will see that I have been putting together samples for writing .Net Hadoop MapReduce jobs; using Hadoop Streaming. However one thing that became apparent is that the samples could be reconstructed in a composable framework to…


Hadoop Binary Streaming and F# MapReduce

As mentioned in my previous post Hadoop Streaming not only supports text streaming, but it also supports Binary Streaming. As such I wanted to put together a sample that supports processing Office documents. As before the code can be downloaded from: http://code.msdn.microsoft.com/Hadoop-Streaming-and-F-f2e76850 Putting together this sample involved a bit more work than the text streaming…


Hadoop Streaming and F# MapReduce

As you may know Microsoft has recently announced plans for a Hadoop adoption for both Windows Server and Windows Azure. You can find out more about Hadoop and Windows Azure at Apache Hadoop-based Services for Windows Azure and Availability of Community Technology Preview (CTP) of Hadoop based Service on Windows Azure. If you are not…


Adventures in TSQL: SQL Server Query Performance Analysis using DMVs

From the development perspective I often have to perform an analysis of a database application. More often than not this entails looking at a running system and ensuring that the application queries are behaving as expected. As such, I thought it would be worthwhile sharing some TSQL scripts that I have been using over the…


Creating a Partitioned View in the BAM Archiving Database

When you run the BAM data maintenance package (BAM_DM_<activity name>) BAM copies each partition in the BAM Primary Import database to a separate table in the BAM Archive database. You can create partitioned views in the BAM Archive database to facilitate locating the data. However one is left to create these partitioned views oneself. A…