Microsoft R Server (MRS) for Analysts is designed to help analyst’s familiar with other environments migrate their knowledge of data preparation and analysis to Microsoft R Server. This 3-day training course takes a use-case-based approach to walk through the knowledge discovery and data mining process using MRS in a local compute context (on a single server). This course assumes sufficient knowledge of fundamental concepts in R (as laid out by the course prerequisites), and it allows an experienced analyst (and intermediate R user) to transition to using MRS’s set of tools and capabilities for scalable big data-processing and analytics.
Level: Technical 400
- Solid understanding of R data structures (vectors, matrices, lists, data frames, environments): for example, students should confidently tell the difference between a list and a data frame, or what each object is generally a good representation for and how to subset it.
- Understanding of how to write R functions: for example, students should be able to write functions that process data in bulk (multiple columns), be able to debug functions, know how R deals with variables that are out of scope, or how to use the ellipsis to pass arguments.
- Good understanding of data manipulation and data processing in R: students should be familiar with functions such as merge, transform, subset, cbind, rbind, lapply, apply and how these functions can be used to work with a data.frame; moreover, familiarity with 3rd party packages such as dplyr is also helpful.
- Good understanding of control flow and other basic programming concepts: students should know for example what loops are, and how we can use the apply family of functions to rewrite loops, be familiar with functions such as do.call, assign, etc.
Modules Covered in the Course
Business Case discussion
Load large dataset for analysis by MRS
Understanding how to choose between CSV vs XDF
Importing data into MRS
Cleaning and preparing data for analysis using MRS
Basic data transformations (cleaning missing values, normalizing, rescaling)
Passing custom transformation functions to MRS to leverage existing R code
Visualise, explore, and summarise data using MRS
Summarising numeric data (five-point summary, correlations, histograms and line plots) and categorical data (cross-
tabulations and barplots)
Benchmarking performance on different data types (XDF vs CSV)
Estimate models using MRS
Linear and Generalised Linear models
Model tuning and cross-validation
- Read from and write to large files using MRS (both flat files such as CSV and MRS’s XDF distributed data format)
- Prepare data for analysis
- Visualise, explore, and summarise data
- Estimate and tune basic statistical models
- Deploy models through scoring functions
December 15 – 16
Cliftons Level 13, 60 Margaret Street, Sydney
December 12 – 14
Cliftons Level 1, 440 Collins Street, Melbourne