Presto with Azure Cosmos DB, Azure SQL Database, MySQL, PostgreSQL, and Azure Blob Storage

In this example, I’m showing how to configure Presto connectors to query and join data from Azure Cosmos DB (using MongoDB API), Azure SQL Database, Azure Database for MySQL, and Azure Database for PostgreSQL while storing the joined results in Azure Blob Storage. Take a look at the video walkthrough below and the source code…

0

Presto querying data in Azure Blob Storage and Azure Data Lake Store

Recently, I created a simple POC of a single-node Presto querying data in Azure Blob Storage (WASB) and Azure Data Lake Store (ADLS). In my example, Presto (version 0.167 or 0.178) is accessing these data stores via Presto’s hive-hadoop2 connector (with a few additional JARs) and needs Hive metastore service to store the metadata about…

0

Accessing Azure Data Lake Store using WebHDFS with OAuth2 from Spark 2.0 that is running locally

Update from March 2017: Since posting this article in August 2016, Azure Data Lake Product Team  published three new and highly recommended blog posts: Connecting your own Hadoop or Spark to Azure Data Lake Store Making Azure Data Lake Store the default file system in Hadoop Wiring your older Hadoop clusters to access Azure Data…

3

Compile and build specific Hadoop source code branch using Azure VM

Sometimes you may want to test a Hadoop feature that is available in a specific branch that is not available as a binary release. For example, in my case, I want to try accessing Azure Data Lake Store (ADLS) via its WebHDFS endpoint. Access to ADLS requires OAuth2, support for which was added in Hadoop…

0

Accessing Azure Storage Blobs from Spark 1.6 that is running locally

When you are using HDInsight Hadoop or Spark clusters in Azure, they are automatically pre-configured to access Azure Storage Blobs via the hadoop-azure module that implements the standard Hadoop FilesSystem interface. You can learn more about how HDInsight uses blob storage at https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-use-blob-storage/ In this article, I will show how we can configure a local…

3

How to create and test Azure Service Principal using Azure CLI

Many automation tools (e.g. HashiCorp’s Packer.io and Terraform) and SDKs (e.g. Azure SDK for Java, Ruby, Python, Go, Node, etc.) that communicate with Azure using the Azure Resource Manager (ARM) APIs need a “Service Principal” credential for authentication via Azure Active Directory. This credential is usually defined by specifying the following properties somewhere within the…

0

Resolving Spark 1.6.0 "java.lang.NullPointerException, not found: value sqlContext" error when running spark-shell on Windows 10 (64-bit)

It is easy to follow the instructions on http://spark.apache.org/docs/latest/ and download Spark 1.6.0 (Jan 04 2016) with the “Pre-build for Hadoop 2.6 and later” package type from http://spark.apache.org/downloads.html However, when you try to run spark-shell on your Windows 10 (64-bit) machine, you may receive a java.lang.RuntimeException: java.lang.NullPointerException (not found: value sqlContext) java.lang.RuntimeException: java.lang.NullPointerException         at…

15

Linux Azure VM Scale Sets with shared storage using Lustre

Azure Virtual Machine Scale Set is a compute resource you can use to deploy and manage an elastic collection of identical and usually stateless VMs. Mark Russinovich announced and demoed the public preview of the Azure VM Scale Sets (VMSS) in his November 11, 2015 blog post. Currently, as of December 2015, Azure VM Scale…

0

Deploying Intel Cloud Edition for Lustre* on Microsoft Azure

Arsen Vladimirskiy | Updated April 27, 2016 About Lustre Lustre is the most widely used parallel filesystem in high performance computing (HPC) environments. This is because Lustre provides POSIX compliance, offers extreme performance when used with hundreds of clients, and can scale up in both speed and storage volume as nodes are added to the…

1

Certificate-based auth with Azure Service Principals from Linux command line

In his comprehensive article, Developer’s guide to auth with Azure Resource Manager API, Dushyant Gill describes multiple authentication flows for obtaining an access token from Azure Active Directory and using it to invoke Azure Resource Manager REST APIs (ARM). There are also multiple open-source Azure SDKs (Java, Node.js, Go, Ruby, Python, .NET) that encapsulate the…

11