The Data Science Workloads in Visual Studio 2017 RC

From getting an automatic photo tag on Facebook, to a product recommendation online, to searching your photos using keywords, or getting a fraud alert on your credit card, … Machine Learning and Data Science are all around us in one form or another.

Today we’re delighted to announce that Visual Studio 2017 RC now has dedicated workloads for Data Storage and Data Science. These two stacks provide you with all backend and tooling you need to build your next generation intelligent apps and services.

Let’s take a look at the workloads in a bit more detail:

  1.  Data storage and processing – Big Data Storage and Advanced Analytics
  2.  Data Science – All the tooling you need to analyze, build models, and create smart apps
    • Python Tools for Visual Studio – Desktop, Web, Scientific, Data Science/ML
    • R Tools for Visual Studio – Primarily Stats and Data Science/ML
    • F# – A functional-first .Net language suited for a variety of data processing tasks

Why R and Python?

While Python has been available for a while, R is the new entry in the VS family of languages. R is the most popular Data Science / Stats focused language and comes with a rich ecosystem of ready to use packages.

There are many “language popularity” rankings out there and all of them should be taken with a grain of salt, but it’s safe to say that if you’re doing Analytics, R and Python should be in your toolbox:

Most of the Microsoft Storage and Analytics technologies either already have R/Python support (direct or via SDKs), or will be having them soon. Let’s look at the tooling next.

Python Tools for Visual Studio

VS 2017 RC provides rich integration for Python, covering various scenarios from machine learning to desktop to IoT to the web. It supports most interpreters such as CPython (2.x, 3.x), IronPython, Jython, PyPy, … along with the Anaconda distro and access to thousands of packages on PyPI. For the list of new features for Python, please see the product release notes.

R Tools for Visual Studio

RTVS turns VS into a powerful R IDE that includes the usual features you’d expect like intellisense, debugging, REPL, History, etc. and advanced ones such as Stored Procedures with R that run in a SQL database, multiple independent plots, to Remoting. Remoting is very powerful in that it allows all the features of RTVS to be run on a remote machine (as if you had used Terminal Server). It is perfect for when you want to use your laptop on a subset of data locally, and then connect to a large server and continue to use the full IDE features and finally deploy your code:

Visual Studio supports both the standard CRAN R version and the enhanced Microsoft R which provides various performance and enterprise focused features.

F#

F# is a programming language that provides support for functional programming in addition to traditional object-oriented and imperative (procedural) programming. It is a great language for data processing and has a strong third-party ecosystem for accessing, manipulating, and processing data. The Visual F# tools in Visual Studio provide support for developing F# applications and extending other .NET applications by using F# code. F# is a first-class member of .NET, and retains a strong resemblance to the ML family of functional languages.

There’s a package for that!

Beyond Visual Studio integration, the Data Science workload comes preinstalled with hundreds of packages that cover just about any Advanced Analytics related scenario from image processing to bio-informatics to astronomy. The Data Science workload by default includes:

  • The Microsoft R Client – a Microsoft enhanced version of R that provides multi-core, pkg versioning and distributed memory support
  • The Anaconda Python distro – a cross-platform collection of curated Python packages from Continuum.io for machine learning, scientific computing and web scenarios.

Azure Python SDK

Azure now has SDKs covering just about every service and language, including Python. The Python Azure SDK has full support for core compute, storage, networking, keyvault and monitoring services, on par with .Net. Management coverage includes services such as Data Lake Store and Data Lake Analytics, SQL Database, DocumentDB, etc. Data support examples include SQL Database, SQL Server, DocumentDB and Data Lake Store File System.

Join our team (virtually)!

The entire Data Science stack, from tools to libraries, is open source and hosted on github. We’d like to invite you to check out the code base, fork it, file a bug, or if you’d like, add a feature! You can find the repos here:

One more thing: Free interactive Python & R Notebooks!

While Visual Studio is a highly productive desktop IDE, sometimes you just need a “REPL on steroids” to do some slicing and dicing and plotting of your data right in the browser and possibly sharing the results:

Azure notebooks is a free, hosted Jupyter notebook service:

Jupyter is like OneNote if it supported running code: it supports text (as Markdown), code, inline graphics, etc. It currently supports R, Python 2, Python 3 (with Anaconda distros). F# is coming soon. The best to learn about Azure Notebooks is to try one of the samples:

The free service is particularly useful for faculty/students, giving webinars, product demos, sharing live reports, etc. Check out some of the thousands of high quality notebooks out there.

Conclusion

Data Science helps transform your data into intelligent action. Watch this Connect(); video on Data Science and Web Development to learn more. The Visual Studio Data Science workload is our first foray into providing you with everything needed to build the next generation of intelligent apps, whether on the desktop, cloud, IoT or mobile. Take it for a spin, check out the built-in libraries and packages, peruse CRAN and PyPI for even more, and let us know what you think!

For problems, let us know via the Report a Problem option in the upper right corner, either from the installer or the Visual Studio IDE itself or by filling an issue on Github repositories for PTVS or RTVS. You may also leave suggestions on User Voice.

Shahrokh Mortazavi, Partner PM, Visual Studio Cloud Platform Tools

Shahrokh Mortazavi runs the Data Science Developer Tools teams at Microsoft, focused on Python, R, and Jupyter Notebooks. Previously, he was in the High Performance Computing group at Microsoft. He worked on the Phoenix Compiler tool chain (code gen, analysis, JIT) at Microsoft Research and, prior to that, over a 10 year period led Sun Microsystems’ Code Generation & Optimization compiler backend teams.