How to upload an R package to Azure Machine Learning

Azure Machine Learning (https://azure.com/ml) has a number of packages already installed by default. You can see them with this following sample experiment:

image

R script is:

 data.set <-data.frame(installed.packages());
# Select data.frame to be sent to the output Dataset port
maml.mapOutputPort("data.set");

you’ll find a little more than 400 packages.

 

Still you may need to use a package which is not known by Azure ML. Here is how to upload it to the environment.

NB: This post takes skmeans (k-means with a cosine distance) as an example, but this works for other packages as well.

 

Let’s suppose you have this code in R Studio locally.

NB: you can find information on how to setup your environment in this post. It’s in French, but bing translator is your friend.

 library(skmeans)

set.seed(1234)
sample_data <- matrix(sample.int(1000, size = 20*500, replace = TRUE), nrow = 20, ncol = 500, 
                      dimnames=list(1:20, 1:500))

fit <- skmeans(sample_data,5)

result <- data.frame(list(rownames(sample_data), fit$cluster), row.names=NULL)
colnames(result) <- c("sample data row", "cluster")

print(result)

this will give this kind of result

image

If you try this in Azure ML, you’ll get the following result:

image

image

Here is how to have the script loading all the necessary packages in the Azure ML environment.

image

image

So let’s now see how you construct the skmeans_packages.zip and know which lines to write here:

 image

 

On the local environment (in my case Windows), I remove the R packages that are installed in My Documents\R

image

then in R, I install the skmeans package:

install.packages("skmeans")

this gives the following result:

image

So I know I have to install the following packages in order:

  • slam
  • clue
  • skmeans

Then I go to the temp folder:

image

I Zip the zips:

image

and rename this new zip file as skmeans_packages.zip

image

I then can upload it Azure ML:

NEW, DATASET, FROM LOCAL FILE

image image

image

Then you’ll be able to find it as a saved dataset in your workspace:

image

After it has been connected to the third dot of the Execute R Script module instance, you’ll be able to find the content in src/ folder:

image

so, in order  to install skmeans and its two dependencies, then reference the skmeans library, you just have to enter the following lines:

 install.packages("src/slam_0.1-32.zip", lib = ".", repos = NULL, verbose = TRUE)
install.packages("src/clue_0.3-48.zip", lib = ".", repos = NULL, verbose = TRUE)
install.packages("src/skmeans_0.2-6.zip", lib = ".", repos = NULL, verbose = TRUE)
library(skmeans, lib.loc=".", verbose=TRUE)

 

Azure ML has a pool of VM with docker-like containers (true Windows containers, named drawbridge) where the experiments run. So each time the script runs, it starts from a blank standard Azure ML environment. By bringing a zip, you add the files to that environment.

 

Hope this blog post will help you if you need R packages which are not in the 400+ preloaded ones in Azure Machine Learning!

 

Smile

Benjamin (@benjguin)