Using External data with Azure Jupyter Notebooks

image

One of the vital requirements for academics is to provide a single data set to allow all there students to utilise for undertaking experiments.

By hosting data on a Blob Storage account you can allow students connect and undertake experiments using Azure Jupyter Notebook https://azure.notebooks.com  in a pretty straight forward manner.

Data can be uploaded it to an Azure blob using the Azure Storage Explorer tool.

Creating a storage account on Azure

  1. Sign in to the Azure portal.

  2. On the Hub menu, select New -> Storage -> Storage account.

  3. Enter a name for your storage account. See Storage account endpoints for details about how the storage account name will be used to address your objects in Azure Storage.

    Note

    Storage account names must be between 3 and 24 characters in length and may contain numbers and lowercase letters only.

    Your storage account name must be unique within Azure. The Azure portal will indicate if the storage account name you select is already in use.

  4. Specify the deployment model to be used: Resource Manager or Classic. Resource Manager is the recommended deployment model. For more information, see Understanding Resource Manager deployment and classic deployment.

    Note

    Blob storage accounts can only be created using the Resource Manager deployment model.

  5. Select the type of storage account: General purpose or Blob storage. General purpose is the default.

    If General purpose was selected, then specify the performance tier: Standard or Premium. The default is Standard. For more details on standard and premium storage accounts, see Introduction to Microsoft Azure Storage and Premium Storage: High-Performance Storage for Azure Virtual Machine Workloads.

    If Blob Storage was selected, then specify the access tier: Hot or Cool. The default is Hot. See Azure Blob Storage: Cool and Hot tiers for more details.

  6. Select the replication option for the storage account: LRS, GRS, RA-GRS, or ZRS. The default is RA-GRS. For more details on Azure Storage replication options, see Azure Storage replication.

  7. Select the subscription in which you want to create the new storage account.

  8. Specify a new resource group or select an existing resource group. For more information on resource groups, see Azure Resource Manager overview.

  9. Select the geographic location for your storage account. See Azure Regions for more information about what services are available in which region.

  10. Click Create to create the storage account.

Manage your storage account

Change your account configuration

After you create your storage account, you can modify its configuration, such as changing the replication option used for the account or changing the access tier for a Blob storage account. In the Azure portal, navigate to your storage account, find and click Configuration under SETTINGS to view and/or change the account configuration.+

Note

Depending on the performance tier you chose when creating the storage account, some replication options may not be available.

Changing the replication option will change your pricing. For more details, see Azure Storage Pricing page.

For Blob storage accounts, changing the access tier may incur charges for the change in addition to changing your pricing. Please see the Blob storage accounts - Pricing and Billing for more details.

Manage your storage access keys

When you create a storage account, Azure generates two 512-bit storage access keys, which are used for authentication when the storage account is accessed. By providing two storage access keys, Azure enables you to regenerate the keys with no interruption to your storage service or access to that service.+

Note

We recommend that you avoid sharing your storage access keys with anyone else. To permit access to storage resources without giving out your access keys, you can use a shared access signature. A shared access signature provides access to a resource in your account for an interval that you define and with the permissions that you specify. See Using Shared Access Signatures (SAS) for more information.+

View and copy storage access keys

In the Azure portal, navigate to your storage account, click All settings and then click Access keys to view, copy, and regenerate your account access keys. The Access Keys blade also includes pre-configured connection strings using your primary and secondary keys that you can copy to use in your applications.

Regenerate storage access keys

We recommend that you change the access keys to your storage account periodically to help keep your storage connections secure. Two access keys are assigned so that you can maintain connections to the storage account by using one access key while you regenerate the other access key.

Warning

Regenerating your access keys can affect services in Azure as well as your own applications that are dependent on the storage account. All clients that use the access key to access the storage account must be updated to use the new key.

Storage Explorers - If you are using any storage explorer applications, you will probably need to update the storage key used by those applications

Here is the process for rotating your storage access keys

  1. Update the connection strings in your application code to reference the secondary access key of the storage account.
  2. Regenerate the primary access key for your storage account. On the Access Keys blade, click Regenerate Key1, and then click Yes to confirm that you want to generate a new key.
  3. Update the connection strings in your code to reference the new primary access key.
  4. Regenerate the secondary access key in the same manner.

Once you have setup your storage account you can use the Azure Storage Explorer to connect to your storage container create a new BLOB container and upload the data.

image

Azure Storage Explorer showing upload blob which has SampleData for our experiment

Within your Jupyter Notebook you now need to define the connection parameters, So in a code block create the following and take the details from your Azure Account.

image

Example of Notebooks setup

So Code Block is where we define the connection

blob_account_name = "" # fill in your blob account name

blob_account_key = ""  # fill in your blob account key

mycontainer = ""       # fill in the container name

myblobname = ""        # fill in the blob name

mydatafile = ""        # fill in the output file name

The Azure storage account provides a unique namespace to store and access your Azure Storage data objects. All objects in a storage account are billed together as a group. By default, the data in your account is available only to you, the account owner.

There are two types of storage accounts:

In a new code block create your connection and query strings

import os # import OS dependant functionality

import pandas as pd #import data analysis library required

from azure.storage.blob import BlobService

dirname = os.getcwd()

blob_service = BlobService(account_name=blob_account_name,
             account_key=blob_account_key)

blob_service.get_blob_to_path(mycontainer, myblobname, mydatafile)

mydata = pd.read_csv(mydatafile, header = 0)

os.remove(os.path.join(dirname, mydatafile))

print(mydata.shape)


Before you can create a storage account, you must have an Azure subscription, which is a plan that gives you access to a variety of Azure services. You can get started with Azure with a free account.  If you have Imagine Access premium then your an Visual Studio Dev Essentials, you get free monthly credits that you can use with Azure services, including Azure Storage. See Azure Storage Pricing for information on volume pricing.+

To learn how to create a storage account, see Create a storage account for more details. You can create up to 200 uniquely named storage accounts with a single subscription. See Azure Storage Scalability and Performance Targets for details about storage account limits.