Importing data with Microsoft R Server

In this blog, we would play around a new package RevoScaleR which takes a different approach in handling the data. Though, it works directly with flat files, but, is primarily made for a special kind of file format: XDF which enhances processing of the datasets. RevoScaleR package stores the dataset on disk (hard drive) and…


Stratified Splitting using rxExecBy

In Microsoft R Server 9.1, we have a new function called rxExecBy() which can be used to partition input data source by keys and apply user defined function on individual partitions. You can read more about rxExecBy() here : Pleasingly Parallel using rxExecBy In this article, we will look at how to use rxExecby to…

Data Exploration in XDF Files using ScaleR Functions

In this article, let us see few examples to explore data in XDF Files using ScaleR functions. For all the following examples, we will be using input XDF file AirlineDemoSmall.xdf from the SampleData Directory in Microsoft R Server. 1. Obtain the different types of variables present in the dataset OUTPUT :       2….


Pattern Matching on xdf files in Microsoft R Server

Pattern Matching: R uses regular expressions for pattern matching. To find patterns on non-xdf files, it’s pretty straightforward using R’s grep function. Example: Output: [1] “Microsoft01” “Microsoft03” “Microsoft05” [1] “Microsoft01”   When using xdf format files (the binary compressed file format used by RevoScaleR, that is a part of the Microsoft R Server) we can…


Data Wrangling in XDF files using ScaleR Functions

The RevoScaleR package provides a set of over one hundred portable, scalable, and distributable data analysis functions. In this article, we will see some examples of using ScaleR Functions to do Data Wrangling in XDF files. For all the following examples, we will be using input XDF files from the SampleData Directory in Microsoft R…


MRS Capability Extension: Importing and Exporting Large In-Memory Data Frames

Introduction Microsoft R Server is an advanced analytics platform. Enterprise-ready, Microsoft R Server scales and accelerates R. R being an open source, statistical programming language, is a great tool to start building intelligent applications and realizing value in predictive analytics. While powerful, R is single threaded and memory bound. In order to handle Big Data,…