Using Azure SDK for Python

 

Python is a great scripting tool with a large user base. In a recent support case I needed a way to constantly generate files with some random data in windows azure storage (wasb) in order to process them with Spark on HDInsight. Python, the Azure SDK for Python and a few lines of code did the trick. You can install the SDK from Azure SDK for Python . There is also a helpful article on how to use Azure blob storage from Python at Azure Blob Storage from Python . You could even incorporate this in to a bigger pipeline like scheduling the Python script to run with windows task scheduler or linux cron to collect data from a server and upload it to azure storage for analysis with HDInsight.

The script has only a few basic section.

Function Name Description
Imports for packages, declare a blob_service object, set container and subfolder variables.
run() The main driver function. Set configuration parameters like how many iterations or files to create. The interval to wait between creating files. The local temporary directory to create the file in which is uploaded to azure storage. The main loop that calls createLocalFile(), uploadFiletoWasb() and clearnupWasbFolder().
createLocalFile() Function to create a file. This creates a new line with comma separated data. This file will be uploaded to azure storage. You could modify this to create any file format you want.
uploadFileToWasb() Function to upload a file to a container and subfolder in azure storage using the BlobService.
cleanupWasbFolder() Function to delete files from azure storage. This will only keep 10 files in the folder.
uniform() Function with a simple random number generator. Used to randomly generate data from 0 to 100 which is placed in each record in the file which is uploaded.
main() Entry point for script

The script.

import datetime

import time

import urllib2

import json

import random

from azure.storage.blob import BlobService

# replace with your storage account name and key

blob_service = BlobService(account_name='mystorage_account', account_key='mystorage_account_key')

# container and folder must already exist

container = "data"

subfolder = "/test"

def run():

"Driver function"

print("Running GenWasbData.py")

now = datetime.datetime.now()

print

print "Starting Applicaiton: " + now.isoformat()

# default configuration parameters

iterations = 100

interval = 5

number_of_records = 200

local_temp_dir = "c:\\Applications\\temp\\"

# loop

count = 1

while (count <= iterations):

print "Processing: " + str(count) + " of " + str(iterations)

count = count + 1

createLocalFile(local_temp_dir + "file.txt", number_of_records)

uploadFileToWasb(local_temp_dir + "file.txt", count-1)

cleanupWasbFolder(count-1)

time.sleep(interval)

def createLocalFile(fn, number_of_records):

"Create a local file to upload to wasb"

now = datetime.datetime.now()

filename = fn

target = open (filename, 'w') ## a will append, w will over-write

count = 1

while (count <= number_of_records):

line = str(count) + "," + "Device" + str(count) + "," + str(now) + "," + str(uniform(0,100))

target.write(line)

target.write("\n")

count = count + 1

target.close

def uploadFileToWasb(fn, count):

"Upload file to wasb"

new_filename = "file" + str(count) + ".txt"

blob_service.put_block_blob_from_path(container + subfolder, new_filename, fn)

return

def cleanupWasbFolder(count):

"Remove files from wasb"

if (count >= 11):

blob_service.delete_blob(container + subfolder, "file" + str(count-10) + ".txt")

return

def uniform(a, b):

"Get a random number in the range (a, b)"

return round(random.uniform(a, b), 4) ## four dicimal places

if __name__ == '__main__':

    run()

Hope this helps you incorporate Python and the Azure SDK for Python in your next project.

Bill