Running Apache Mahout at Hadoop on Windows Azure (www.hadooponazure.com)

Once you have access enabled to Hadoop on Windows Azure you can run any mahout sample on head node. I am just trying to run original Apache Mahout (http://mahout.apache.org/) sample which is derived from the clustering sample on Mahout’s website (https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data). Step 1: Please RDP to your head node and open the Hadoop command line…

1

Enableing gzip compression with Windows Azure CDN through Web Role

CDN picks up compression from the origin and Windows Azure Storage does not support compression directly so if you get CDN content from Azure Storage origin, it will not be compressed.  So if you have content hosted at Windows Azure Storage you will not be able to have compressed content. To have compressed content, you…

3

Windows Azure CDN and Referrer Header

The Windows Azure Azure CDN, like any other CDNs, attempts to be a transparent caching layer. The CDN doesn’t care who the referring site might be. Like any other CDN, Windows Azure CDN keep things transparent and have no concern of what the referring site is. So it is correct to say that Windows Azure…

1

Primary Namenode and Secondary Namenode configuration in Apache Hadoop

Apache Hadoop Primary Namenode and secondary Namenode architecture is designed as below: Namenode Master: The conf/masters file defines the master nodes of any single or multimode cluster. On master, conf/masters that it looks like this: ———————- localhost ——————— This conf/slaves file lists the hosts, one per line, where the Hadoop slave daemons (datanodes and tasktrackers)…

1

Master Slave architecture in Hadoop

Apache Hadoop  is designed to have Master Slave architecture. Master: Namenode, JobTracker Slave: {DataNode, TaskTraker}, …..  {DataNode, TaskTraker} HDFS is one primary components of Hadoop cluster and HDFS is designed to have Master-slave architecture. Master: NameNode Slave: {Datanode}…..{Datanode} –     The Master (NameNode) manages the file system namespace operations like opening, closing, and renaming files and…

0

What is ignoreRoleInstanceStatus setting in Windows Azure?

ignoreRoleInstanceStatus is described in WebRole and WorkerRole Schema as below http://msdn.microsoft.com/en-us/library/windowsazure/gg557553.aspx  Web Role: ignoreRoleInstanceStatus boolean Optional. When the value of this attribute is set to true, the status of a service is ignored and the endpoint will not be removed by the load balancer. The default value is false. Setting this value to true useful for debugging busy instances…

3

Windows Azure Blob Upload Scenarios

Windows Azure Blob storage API provided following upload scenarios to upload a blob:   Scenario [1]: You can upload a single blob in N parallel threads In your code if you set CloudBlobClient.ParallelOperationThreadCount = N; then N parallel threads will be used to upload a single blob   Scenario [2]: You can upload multiple M…

0

Keys to understand relationship between MapReduce and HDFS

Map Task (HDFS data localization): The unit of input for a map task is an HDFS data block of the input file. The map task functions most efficiently if the data block it has to process is available locally on the node on which the task is scheduled. This approach is called HDFS data localization….

0

Solving SSL Certificate expiration problem with an existing Windows Azure Application

Recently I was working on an issue where the SSL certificate was expired and due to it, the user were warned to not to use site. The certificate expiration was  visible as below:     In this situation the following steps should be taken to get this problem resolved: Get the new SSL certificate from…

1