Last week at Microsoft's Connect 2016 conference, we announced the General Availability of Azure Data Lake Analytics. As part of the announcement we revealed that U-SQL now includes built-in support for Advanced Analytics scenarios. This includes:
- The ability to perform massively distributed analytics using Python
- The ability to perform massively distributed analytics using R
- Built-in Cognitive capabilities (such as image object detection, sentiment analysis, etc.)
In this post we'll give a very brief overview of the Python support. We'll publish additional blog posts that cover R and the Cognitive scenarios later this week. Below is a very simple "Hello World" using Python that illustrates how easy we've made it to use Python with U-SQL. This is the simplest script that demonstrates how you can run Python on vertexes using a special built-in Python Reducer. This script shows the key steps:
- using REFERENCE ASSEMBLY to bring in the needed Python support
- using REDUCE to partition the input data on a key
- a built-in reducer (Extension.Python.Reducer) that runs Python code on each vertex assigned to the reducer
- Embedded Python code in the U-SQL script that accepts a pandas DataFrame as input and returns a pandas DataFrame as output.
To see an simple Hello World sample, go here: https://github.com/Azure-Samples/usql-python-helloworld
To learn more about our support for U-SQL Advanced Analytics and how to enable it in your Data Lake Analytics Accounts, see our Getting Started guide.
Hi,
how do use numpy in this context in terms of loading modules?
Do i have to load numpy as an import statement?
Yes, your script has to explicitly “import numpy”
I guess I would be correct in assuming that “gensim” and “nltk” would have to be similarly imported.
Hi,
I only get the messages “Assembly master.ExtPython” does not exist.
regards,
Uli
Hi,
just found that python assemblies are copied as part of U-SQL extensions. See the “getting started guide” above.
Uli
The “Getting Started Guide” seems to no longer be available as the OneDrive for Business link has expired. Can someone repost this guide?