How to deliver a data science project

Author: Michele Usuelli, Lead Data Scientist, Microsoft Enterprise Services What is a data science project? How does it get delivered? What are the key aspects? There is a lot of hype around data science. The purpose of this article is to provide clarity and the content comes from the successful deliver of project with Microsoft… Read more

Measuring Model Goodness – Part 2

Author: Dr. Ajay Thampi, Lead Data Scientist Measurability is an important aspect of the Team Data Science Process (TDSP) as it quantifies how good the machine learning model is for the business and helps gain acceptance from the key stakeholders. In part 1 of this series, we defined a template for measuring model goodness specifically… Read more

Measuring Model Goodness – Part 1

Author: Dr. Ajay Thampi, Lead Data Scientist Data and AI are transforming businesses worldwide from finance, manufacturing and retail to healthcare, telecommunications and education. At the core of this transformation is the ability to convert raw data into information and useful, actionable insights. This is where data science and machine learning come in. Building a… Read more

Demystifying decision forests

By Michele Usuelli, Lead Data Scientist   This article doesn’t require a data science background, but just some basic understanding of predictive analytics. Besides that, all the concepts are explained from scratch, including a popular algorithm called the “decision forest”. Throughout the article you won’t see any fancy or advanced machine learning algorithm, but by the… Read more

What is the role of a data scientist?

By Michele Usuelli, Lead Data Scientist Data Science has been around for decades, but it recently increased in popularity among companies. Although the tools and techniques existed already, there are some changes. Digital technologies generate more data that can drive new advanced analytics use-cases. Also, there are more success stories show-casing the value in data, making… Read more

Scaling up Scikit-Learns Random Projection using Apache Spark

By Sashi Dareddy, Lead Data Scientist What is Random Projection (RP)? Random Projection is a mathematical technique to reduce the dimensionality of a problem much like Singular Value Decomposition (SVD) or Principal Component Analysis (PCA) but only simpler & computationally faster. [Throughout this article, I will use Random Projection and Sparse Random Projection interchangeably.] It… Read more

Scaling a recommender system across large data volumes

By Michele Usuelli, Data Scientist Consultant Building a recommendation engine in presence of large data volume E-commerce businesses can suggest new products to their customers. How do they choose the products to recommend? The companies collect data about the purchases of their customers. Starting from the purchase history, they can identify items that have been… Read more

Analysing data in SQL Server 16, combining R and SQL

By Michele Usuelli, Data Scientist Consultant Overview R is the most popular programming language for statistics and machine learning, and SQL is the lingua franca for the data manipulation. Dealing with an advanced analytics scenario, we need to pre-process the data and to build machine learning models. A good solution consists in using each tool… Read more

Validating a model in R Services using the k-fold

By Michele Usuelli, Data Scientist Consultant Why k-fold Predictive modelling consists in predicting a future outcome based on the data. Starting from data which outcome is already known, the predictive models detect patterns that had an impact on the outcome. Then, in presence of data which outcome is unknown, the model looks for the same… Read more

Using R Services in SQL Server 2016 Release Candidate 2 (RC2)

Author: Benjamin Wright-Jones Contributors: Sander Timmer, Derek Norton Reviewers: Anderson Chan The results of the recent IEEE Survey (2015) clearly show the rising interest in R (the linga franca of data scientists). In SQL Server 2016, R Services will be available, leveraging the highly scalable and parallel algorithms from the Revolution Analytics engine. SQL Server… Read more