Scaling up Scikit-Learns Random Projection using Apache Spark

By Sashi Dareddy, Lead Data Scientist What is Random Projection (RP)? Random Projection is a mathematical technique to reduce the dimensionality of a problem much like Singular Value Decomposition (SVD) or Principal Component Analysis (PCA) but only simpler & computationally faster. [Throughout this article, I will use Random Projection and Sparse Random Projection interchangeably.] It… Read more

Scaling a recommender system across large data volumes

By Michele Usuelli, Data Scientist Consultant Building a recommendation engine in presence of large data volume E-commerce businesses can suggest new products to their customers. How do they choose the products to recommend? The companies collect data about the purchases of their customers. Starting from the purchase history, they can identify items that have been… Read more

Analysing data in SQL Server 16, combining R and SQL

By Michele Usuelli, Data Scientist Consultant Overview R is the most popular programming language for statistics and machine learning, and SQL is the lingua franca for the data manipulation. Dealing with an advanced analytics scenario, we need to pre-process the data and to build machine learning models. A good solution consists in using each tool… Read more

Validating a model in R Services using the k-fold

By Michele Usuelli, Data Scientist Consultant Why k-fold Predictive modelling consists in predicting a future outcome based on the data. Starting from data which outcome is already known, the predictive models detect patterns that had an impact on the outcome. Then, in presence of data which outcome is unknown, the model looks for the same… Read more

Using R Services in SQL Server 2016 Release Candidate 2 (RC2)

Author: Benjamin Wright-Jones Contributors: Sander Timmer, Derek Norton Reviewers: Anderson Chan The results of the recent IEEE Survey (2015) clearly show the rising interest in R (the linga franca of data scientists). In SQL Server 2016, R Services will be available, leveraging the highly scalable and parallel algorithms from the Revolution Analytics engine. SQL Server… Read more

Evaluating Machine Learning models when dealing with imbalanced classes

Sander Timmer, PhD In real-world Machine Learning scenarios, especially those driven by IoT that are constantly generating data, a common problem is having an imbalanced dataset. This means, we have far more data representing one outcome class than the other. For example, when doing predictive maintenance, there is (far) more data available about the healthy… Read more

PowerShell Script To Invoke ML Scoring Part I

By Earle Sinnatamby, Consultant Objective The purpose of this blog post is to provide PowerShell alternative to utilizing Azure Data Factory to perform Machine Learning (ML) scoring. The Pilot engagement required daily on-premises data to be uploaded into Azure Blob Storage. Each data file uploaded required daily rescoring with ML script provided by the client… Read more

Why Public Cloud beats Private Cloud for Analytics: A Data Warrior’s Perspective

By Bill Eldredge, Associate Architect As the former head of the Big Data Management and Governance team at Nokia, I was responsible for managing our internal business customers’ needs and expectations use of the private Hadoop cloud and related Big Data Asset we spent five years building and maintaining. Unfortunately, several of those years amounted… Read more

Hacking Web Service Parameters for R Modules in Azure Machine Learning

By Bob Savard, Senior Consultant Purpose The purpose of this blog post is to use a particular use-case to explain how web service parameters are used in Azure ML, and how you can enhance their existing capabilities. Introduction to URIs in Azure ML Using Azure ML Studio, users publish REST web services. Interaction with those… Read more

Basket analysis with SQL Data Warehouse

Stefan Cronjaeger, Technical Solution ProfessionalMichael Hlobil, Architect, Data Insights Global Practice Shopping basket analysis typically asks questions like: “If a customer bought product A what might she also be interested in?”. This is typically used for recommendation engines or for arrangement of products in the shelves of the shop. In order to answer these questions,… Read more