Ask Learn
Preview
Please sign in to use this experience.
Sign inThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This post is authored by Le Zhang, Data Scientist, and Graham Williams, Director of Data Science at Microsoft.
Azure Data Science Virtual Machine (DSVM) is a curated Azure VM image preinstalled and configured with popular tools that are commonly used for data analytics and machine learning, including Microsoft R Server Developer Edition, Anaconda Python distribution, Jupyter notebook (with R, Python kernels), etc. The DSVM is a desirable workplace for experimental analytics on a single low-end VM, collaborative prototyping of machine learning proof-of-concepts, or operationalizing an end-to-end data science or AI workflow.
For R-based data scientists, data engineers, and architects, it is beneficial to use and operate DSVMs for various application scenarios with minimal effort. AzureDSVM (version 0.2.0) is an R package that helps R users to directly manage the DSVM from within an R session. It provides a comprehensive set of operational functions to:
AzureDSVM is an open source R package which is hosted on GitHub. To install, one simply runs the following code:
devtools::install_github("Azure/AzureDSVM")
AzureDSVM relies on AzureSMR, with the latter offering methods to authenticate against an authorized Application on Azure Active Directory, and manages a selected set of Azure components. The same preliminary steps for the set up of AzureSMR also applies to AzureDSVM. Detailed instructions can be found here.
After installation one can try out sample code from tutorials in the various provided vignettes. For example, the following deploys a Ubuntu Linux DSVM with given the specifications of the authentication method, VM size, etc. The context is the active authentication context used in AzureSMR. Other arguments are specifications of the DSVM itself.
deployDSVM(context, resource.group="<resource_group>", location="<location>", hostname="<dsvm_name>", username="<user_name>", size="<dsvm_size>", pubkey="<public_key>")
We can stop or delete a DSVM when it is no longer required. Stopping the operation from AzureDSVM both shuts down the DSVM and also deallocates it. This means that there is no charge any more associated with the DSVM.
Many application scenarios can benefit from the methods provided by AzureDSVM which allow functional and elastic operation of the DSVMs from within an R session. To illustrate, the following are some representative design patterns we have identified in customer project engagements, which target specific scenarios regardless of business domain or context.
In a real-world use case, it is common that the four patterns are mixed to form a more sophisticated architecture. Illustrative examples in a use case of Flight Delay Prediction and another one of Solar Panel Power Forecasting are referred for more information. The former exhibits an end-to-end development pipeline built on top of a set of heterogeneous DSVMs, and the latter shows configuring a DSVM for deep learning with Microsoft Cognitive Toolkit in R. Note DSVM is designed more for prototyping and experimenting work, so once an established pipeline is transformed into a production one, DSVM instances can be replaced with other suitable Azure components (e.g., HDInsight if Hadoop/Spark cluster is needed). The replacement can be conveniently achieved with AzureSMR.
Le Zhang, Graham Williams
Please sign in to use this experience.
Sign in