Using Custom Python Libraries with U-SQL


The Python extensions for Azure Data Lake Analytics ships with the standard Python libraries and includes pandas and numpy. We’ve been getting a lot of questions about how to use custom libraries with the Python extensions.

The good news is that this is simple.

Introducing zipimport

First, let’s talk about “zipimport”. Thanks to the adoption of PEP 273 – Python had the ability to import modules from ZIP files since Python 2.3. This ability is called “zipimport” and is a built-in feature of the Python’s existing import statement. Read the zipimport documentation now.

To review the basics.

  • You create a module (a .py file, etc.)
  • ZIP up the module into a .zip file
  • Add the path to the .zip file to sys.path
  • Then import the module

Ultimately the .zip file behaves just like any normal folder does.

Build and test a simple zipped package

Before you try to use a custom module with U-SQL make sure you have mastered the mechanics of zipimport

Create a file called mymodule.py

# demo module
hello_world = "Hello World! This is code from a custom module"

As you can see, all it does is define a single variable

Create a zip file called modules.zip that contains the mymodule.py at the root .

  • In windows you can create right-click on mymodule.py and Select Send to compressed folder
    • This will create a file called mymodule.zip
  • Rename mymodule.zip to mycustommodules.zip
    • This renaming step isn’t strictly speaking necessary, but will help highlight how the process will work

Now create a test.py Python file in the same folder as mycustommodules.zip

import sys
sys.path.insert(0'mycustommodules.zip')
import mymodule
print(mymodule.hello_world)

Now you should have a folder that contains

  • test.py
  • mycustommodules.py

then just run the program

python test.py

And it should show this as output

Hello World! This is code from a custom module

Before you proceed, make sure this works.

Deploying Custom Python Modules with U-SQL

First upload the mycustommodules.zip file to your ADLS store – in this case we will upload it to the root of the default ADLS account for the ADLA account we are using – so its path is “\mycustommodules.zip”

Then run this U-SQL script

REFERENCE ASSEMBLY [ExtPython];
DEPLOY RESOURCE @"/mycustommodules.zip";

// mymodule.py is inside the mycustommodules.zip file

DECLARE @myScript = @"
import sys
sys.path.insert(0, 'mycustommodules.zip')
import mymodule

def usqlml_main(df):
 del df['number']
 df['hello_world'] = str(mymodule.hello_world)
 return df
";

@rows = 
 SELECT * FROM (VALUES (1)) AS D(number);

@rows =
 REDUCE @rows ON number
 PRODUCE hello_world string
 USING new Extension.Python.Reducer(pyScript:@myScript);

OUTPUT @rows
 TO "/demo_python_custom_module.csv"
 USING Outputters.Csv(outputHeader: true);

It will produce a simple CSV file with “Hello World! This is code from a custom module” as a row.

 


Comments (0)

Skip to main content