In a previous post I talked about a Co-occurrence Approach to an Item Based Recommender, that utilized the Math.Net Numerics library. Recently the Math.Net Numerics library was updated to version 2.3.0. With this version of the library I was able to update the code to more efficiently read the Sparse Matrix entries. As such I have updated the code to reflect these library changes:

http://code.msdn.microsoft.com/Co-occurrence-Approach-to-57027db7

The new Mat.Net Numerics Library changes were around the storage of the Vector and Matrix elements. As such I was now able to access the storage directly and use the Compress Sparse Row Matrix format to more efficiently access the Sparse Matrix elements.

The original code that accessed the elements of the Sparse Matrix was a simple row/column traverse:

// Define the priority queue and lookup table

let queue = PriorityQueue(coMatrix.ColumnCount)

let lookup = HashSet(products)

// Add the items into a priority queue

products

|> Array.iter (fun item ->

let itemIdx = item – offset

if itemIdx >= 0 && itemIdx < coMatrix.ColumnCount then

seq {

for idx = 0 to (coMatrix.ColumnCount – 1) do

let productIdx = idx + offset

let item = coMatrix.[itemIdx, idx]

if (not (lookup.Contains(productIdx))) && (item > 0.0) then

yield KeyValuePair(item, productIdx)

}

|> queue.Merge)

// Return the queue

queue

Now one has access to the storage elements I was able to more efficiently access just the sparse element values:

|> Array.iter (fun item ->

let itemIdx = item – offset

let sparse = coMatrix.Storage :?> SparseCompressedRowMatrixStorage<double>

let last = sparse.RowPointers.Length – 1

if itemIdx >= 0 && itemIdx <= last then

let (startI, endI) =

if itemIdx = last then

(sparse.RowPointers.[itemIdx], sparse.RowPointers.[itemIdx])

else

(sparse.RowPointers.[itemIdx], sparse.RowPointers.[itemIdx + 1] – 1)

seq {

for idx = startI to endI do

let productIdx = sparse.ColumnIndices.[idx] + offset

let item = sparse.Values.[idx]

if (not (lookup.Contains(productIdx))) && (item > 0.0) then

yield KeyValuePair(item, productIdx)

}

|> queue.Merge)

// Return the queue

queue

In the new version of the code The Values array provides access to the underlying non-empty values. The RowPointers array provides access to the value indexes where each row starts. Finally, the ColumnIndicies are the column indices corresponding to the values.

Other than this change all other aspects of the library’s usage were effectively unchanged; including the MapReduce code (postings can be found here), as this uses a collection of Vector types. I did however update the job submission scripts.