Using Microsoft R Server Operationalization on HDInsight


R Server on HDInsight cluster allows R scripts to use Spark and MapReduce to run distributed computations. You can develop a model and operationalize the model to make predictions by configuring Edge Node as One-Box. We have One-Click Deploy ARM Templates using which you can create R Server on HDInsight Cluster with EdgeNode configured as One-Box. If you already have R Server on HDInsight Cluster, you can configure operationalization using Admin Utility.

Once you have configured Operationalization, you can use the mrsdeploy package on your local machine Microsoft R Client to connect to the Operationalization on edge node and start using its features like remote execution and web-services. This article focuses on how to connect to Operationalization feature depending on whether your cluster is set up on a virtual network or not.

RServer Cluster on virtual network

  • Create a Virtual Network + Subnet before creating the HDI cluster (The VNET and the HDI Cluster must be created in the same resource group).
  • When defining the HDI cluster via the Azure Portal, go into the “Advanced Settings” and specify to use the VNET and Subnet that were previously defined.
  • Once the cluster is created, select the resource group that was used, and select the “virtual network” that was created.
  • From there, a list of the attached devices will be shown.  Select the edge node from the list, and then select IP configurations.
  • Select the default IP configuration assigned to the edge node, and then change “Public IP Address” from disabled to enabled.
  • Create a new Public IP Address and select it.  Click the “save” button to save the changes.
  • Go back to the “edge node” and select “network security group” It should show “none”.  Select the network security group, and add an “inbound security rule” that opens port 12800.

Now, the edge node has a public IP address and port 12800 open. You can connect to it using remoteLogin() from mrsdeploy package like this :

library(mrsdeploy)
remoteLogin(
    deployr_endpoint = "http://<EDGE NODE PUBLIC IP>:12800",
    username = "admin",
    password = "xxxxxxx"
)

RServer Cluster not set up on virtual network

If your cluster is not set up on VNET, you can use SSH port forward tunneling to connect to port 12800 of edgenode.

The SSH port forward tunneling method might be unworkable if you are invoking Operationalization feature from Azure Data Factory, Azure Functions, Azure API Management etc. In this scenario, we can use ngrok to expose port 12800 on edge node to the internet.

SSH into the edge node and run the following commands to install ngrok and open port 12800 :

wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
  • Unzip
unzip ngrok-stable-linux-amd64.zip
  • Explore ngrok commands, check version 2.2.8
./ngrok help
./ngrok version
  • Expose port 12800 http protocol. The following command will start a background process exposing port 12800 to the internet using a public URL. Logs will be directed to ngrok.log file. All incoming http requests will be logged in this file.
./ngrok http 12800 --log=stdout > ngrok.log &
  • You can find this public URL using the following command.
curl http://localhost:4040/api/tunnels | json_pp
  • The above command prints a json. Find the value of “public_url” property (will look like below URL)
"public_url" : "https://fea660ec.ngrok.io"

Now from your application code, you can connect to Operationalization feature using this public_url. Sample code to connect using remoteLogin() function in mrsdeploy package:

library(mrsdeploy)
remoteLogin(
    deployr_endpoint = "https://fea660ec.ngrok.io",
    username = "admin",
    password = "xxxxxxx"
)
Comments (0)

Skip to main content