Importing data with Microsoft R Server

In this blog, we would play around a new package RevoScaleR which takes a different approach in handling the data. Though, it works directly with flat files, but, is primarily made for a special kind of file format: XDF which enhances processing of the datasets. RevoScaleR package stores the dataset on disk (hard drive) and…

0

rxExecBy – Productivity and scale with partitioned data

There is often a need to train data for “many small models” instead of a “single big model”. Specifically, users may want to train separate models such as logistic regressions or boosted trees within groups (partitions) like “states”, “countries”, “device id”, etc. or they may want to compute summary statistics such as mean, min, max,…

0

Running Pleasingly Parallel workloads using rxExecBy on Spark, SQL, Local and Localpar compute contexts

RevoScaleR function rxExec(), allows you to run arbitrary R functions in a distributed fashion, using available nodes (computers) or available cores (the maximum of which is the sum over all available nodes of the processing cores on each node). The rxExec approach exemplifies the traditional high-performance computing approach: when using rxExec, you largely control how…


Microsoft R Server – Using Hive data source in Spark compute context

Before Microsoft R Server 9.0 release, if you needed to perform analytics on your Hive or Parquet data you had to first manually export to some supported format (e.g., csv) and then use something like RxTextData to perform analytics after potentially uploading the text data to HDFS. With Microsoft R Server 9.0 release, Spark compute…


Microsoft R Server Operationalization Examples

Today, more and more businesses are adopting advanced analytics for mission critical decision making in areas such as fraud detection, healthcare and manufacturing. Typically, the data scientists first build out the predictive models and only then can businesses deploy those models in a production environment and consume them for predictive actions Here are few examples…


Pattern Matching on xdf files in Microsoft R Server

Pattern Matching: R uses regular expressions for pattern matching. To find patterns on non-xdf files, it’s pretty straightforward using R’s grep function. Example: Output: [1] “Microsoft01” “Microsoft03” “Microsoft05” [1] “Microsoft01”   When using xdf format files (the binary compressed file format used by RevoScaleR, that is a part of the Microsoft R Server) we can…


Data Wrangling in XDF files using ScaleR Functions

The RevoScaleR package provides a set of over one hundred portable, scalable, and distributable data analysis functions. In this article, we will see some examples of using ScaleR Functions to do Data Wrangling in XDF files. For all the following examples, we will be using input XDF files from the SampleData Directory in Microsoft R…