Announcing preview of Machine Learning Services with R support in Azure SQL Database

Today we are announcing the preview of Machine Learning Services with R support in Azure SQL Database! Azure SQL Databases is the first DBaaS to offer in-database machine learning. You can evaluate this preview functionality in any server/database created in the West Central US Region only right now. Support for more regions will be enabled in the coming months.

In-database Machine Learning support was added in SQL Server 2016 and we are now bringing the same functionality to Azure SQL Database. You can now train and score machine learning models in Azure SQL Database and the predictions can be exposed to any application using your database, easily and seamlessly.

The preview functionality allows you to train and score machine learning models using data that fits in memory (in R data frame). Please note that the amount of memory available for R scripts execution depends on the edition of the Azure SQL database and cannot be modified.

The in-database R functionality can be used on the following database service tiers & performance levels:

Premium Service Tier – P1, P2, P4, P6, P11, P15
Premium RS Service Tier – PRS1, PRS2, PRS4, PRS6
Premium Elastic Pool – 125 eDTUs or greater
Premium RS Elastic Pool – 125 eDTUs or greater

What works

The following functionality is supported in the R scripts:

  • Microsoft R Open with R version 3.3.3
    • Base R Packages & Functions
  • Microsoft R Server 9.2
    • RevoScaleR package
  • Training of models with data that fits in memory
  • Scoring of models with data that fits in memory
  • Trivial parallelism for R Script execution (using the @parallel parameter in sp_execute_external_script)
  • Streaming execution for R Script execution (using @r_RowsPerRead parameter in sp_execute_external_script)
  • One concurrent execution of R Script

What doesn’t work yet

The following functionality is not supported in Machine Learning Services:

  • Configuration & Monitoring
    1. Option to enable R/Python scripts execution on specific databases
    2. DMV to monitor CPU & Memory usage of R / Python Scripts
    3. Package installation using CREATE EXTERNAL LIBRARY DDL
  • R scripts
    1. MicrosoftML Package
    2. SQL Compute Context support from client or R script executing on the server
    3. Parallel execution for R Server models – Using the numtasks parameter in execution of Rx functions
    4. Loopback connection from R script to SQL Server – Connecting back to the server from R script inside sp_execute_external_script
  • Python support

In future updates, we will expand support of Machine Learning Services to all supported regions in Azure SQL Database and enable the full set of features that are available for Machine Learning Services in SQL Server 2017.

Train and Score Models in Azure SQL Database

Now, let us look at a simple end-to-end example using the built-in iris dataset in R to see how you can train & score models using Machine Learning Services in Azure SQL Database.

/* Step 1: Setup schema */
drop table if exists iris_data, iris_models;

create table iris_data (
              id int not null identity primary key
              , "Sepal.Length" float not null, "Sepal.Width" float not null
              , "Petal.Length" float not null, "Petal.Width" float not null
              , "Species" varchar(100) null

create table iris_models (
       model_name varchar(30) not null primary key,
       model varbinary(max) not null,
       native_model varbinary(max) not null

/* Step 2: Populate test data from iris dataset in R */
insert into iris_data
("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species")
execute   sp_execute_external_script
                     @language = N'R'
                     , @script = N'iris_data <- iris;'
                     , @input_data_1 = N''
                     , @output_data_1_name = N'iris_data';

/* Step 3: Create procedure for training model */
create or alter procedure generate_iris_model
(@trained_model varbinary(max) OUTPUT, @native_trained_model varbinary(max) OUTPUT)
       execute sp_execute_external_script
         @language = N'R'
       , @script = N'
# Build decision tree model to predict species based on sepal/petal attributes
iris_model <- rxDTree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = iris_rx_data);

# Serialize model to binary format for storage in SQL Server
trained_model <- as.raw(serialize(iris_model, connection=NULL));

# Serialize model to native binary format for scoring using PREDICT function in SQL Server
native_trained_model <- rxSerializeModel(iris_model, realtimeScoringOnly = TRUE)
       , @input_data_1 = N'
select "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species"
from iris_data'
       , @input_data_1_name = N'iris_rx_data'
       , @params = N'
@trained_model varbinary(max) OUTPUT, @native_trained_model varbinary(max) OUTPUT'
       , @trained_model = @trained_model OUTPUT
       , @native_trained_model = @native_trained_model OUTPUT;

/* Step 4: Train & store a decision tree model that will predict species of flowers */
declare @model varbinary(max), @native_model varbinary(max);
exec generate_iris_model @model OUTPUT, @native_model OUTPUT;

delete from iris_models where model_name = 'iris.dtree';
insert into iris_models (model_name, model, native_model)
values('iris.dtree', @model, @native_model);

select model_name
     , datalength(model)/1024. as model_size_kb
     , datalength(native_model)/1024. as native_model_size_kb
  from iris_models;
/* Step 5: Create procedure for scoring using the decision tree model */
create or alter procedure predict_iris_species (@model varchar(100))
              declare @rx_model varbinary(max) = (select model from iris_models
                                                   where model_name = @model);
              -- Predict based on the specified model:
              exec sp_execute_external_script
                                  @language = N'R'
                                  , @script = N'
# Unserialize model from SQL Server

# Predict species for new data using rxDTree model
OutputDataSet <-rxPredict(irismodel, iris_rx_data, extraVarsToWrite = c("Species", "id"));
              , @input_data_1 = N'
select id, "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species"
from iris_data'
              , @input_data_1_name = N'iris_rx_data'
              , @params = N'@rx_model varbinary(max)'
              , @rx_model = @rx_model
              with result sets ( ("setosa_Pred" float, "versicolor_Pred" float, "virginica_Pred" float, "Species.Actual" varchar(100), "id" int));

/* Step 6: Test scoring of model */
exec predict_iris_species 'iris.dtree';