The Python extensions for Azure Data Lake Analytics ships with the standard Python libraries and includes pandas and numpy. We’ve been getting a lot of questions about how to use custom libraries with the Python extensions.
The good news is that this is simple.
First, let’s talk about “zipimport”. Thanks to the adoption of PEP 273 – Python had the ability to import modules from ZIP files since Python 2.3. This ability is called “zipimport” and is a built-in feature of the Python’s existing import statement. Read the zipimport documentation now.
To review the basics.
- You create a module (a .py file, etc.)
- ZIP up the module into a .zip file
- Add the path to the .zip file to sys.path
- Then import the module
Ultimately the .zip file behaves just like any normal folder does.
Build and test a simple zipped package
Before you try to use a custom module with U-SQL make sure you have mastered the mechanics of zipimport
Create a file called mymodule.py
# demo module hello_world = "Hello World! This is code from a custom module"
As you can see, all it does is define a single variable
Create a zip file called modules.zip that contains the mymodule.py at the root .
- In windows you can create right-click on mymodule.py and Select Send to compressed folder
- This will create a file called mymodule.zip
- Rename mymodule.zip to mycustommodules.zip
- This renaming step isn’t strictly speaking necessary, but will help highlight how the process will work
Now create a test.py Python file in the same folder as mycustommodules.zip
import sys sys.path.insert(0, 'mycustommodules.zip') import mymodule print(mymodule.hello_world)
Now you should have a folder that contains
then just run the program
And it should show this as output
Hello World! This is code from a custom module
Before you proceed, make sure this works.
Deploying Custom Python Modules with U-SQL
First upload the mycustommodules.zip file to your ADLS store – in this case we will upload it to the root of the default ADLS account for the ADLA account we are using – so its path is “\mycustommodules.zip”
Then run this U-SQL script
REFERENCE ASSEMBLY [ExtPython]; DEPLOY RESOURCE @"/mycustommodules.zip"; // mymodule.py is inside the mycustommodules.zip file DECLARE @myScript = @" import sys sys.path.insert(0, 'mycustommodules.zip') import mymodule def usqlml_main(df): del df['number'] df['hello_world'] = str(mymodule.hello_world) return df "; @rows = SELECT * FROM (VALUES (1)) AS D(number); @rows = REDUCE @rows ON number PRODUCE hello_world string USING new Extension.Python.Reducer(pyScript:@myScript); OUTPUT @rows TO "/demo_python_custom_module.csv" USING Outputters.Csv(outputHeader: true);
It will produce a simple CSV file with “Hello World! This is code from a custom module” as a row.