Azure Startup Tasks and Powershell: Lessons Learned

Article
02/06/2011

Last weekend, I sat down to write the next blog post in my Azure@home series, covering the use of startup tasks to replace some tedious file copying code in the Worker Role. Well, it turned out to be an adventure, and while segment 15 of the series is forthcoming, I thought I’d enumerate some of the not-so-obvious things I discovered here in this stand-alone blog post.

Before you read further, I want to thank both Steve Marx and Adam Sampson for helping me understand some of the nuances of startup tasks. Steve’s blog articles, Windows Azure Startup Tasks: Tips, Tricks, and Gotchas as well as Introduction to Windows Azure Startup Tasks should be required reading and were my starting points – some of his pointers are repeated or expanded in my article as well. Adam, one of the developers on the Azure team and the man behind the new 1.3 Diagnostics, RemoteAccess, and RemoteForwarder modules, helped clear up the primary ‘inexplicable’ behavior I was noticing.

If you’re looking for a walkthrough on startup tasks, this post isn’t it; I’ll take a more didactic approach in the next Azure@home article (or check out this Cloud Cover episode). For sake of example, here’s a rather simple scenario that I’ll use to illustrate my own private gotchas below! The result of this startup task is to write a new file, sometext.txt, in the E:\approot directory of the VM housing the deployed web role – whether that’s useful or not, I won’t comment!

Setup task files in WebRole project

setup.cmd

@echo off powershell -command "Set-ExecutionPolicy Unrestricted" 2>> err.out powershell .\script.ps1 2>> err.out

script.ps1

Start-Transcript -Path transcript.txt New-Item –path .. –name sometext.txt -type "file" ` -value "I was written by a Startup Task!"

Stop-Transcript

Here’s a quick link list to the rest of the article, covering some of the distinct points to be aware of when using Azure startup tasks and/or Powershell:

Dude, where’s my script?
Copy always
Two words, ExecutionPolicy
Logging is your friend
Remote Desktop is your BEST friend
”Simple” isn’t always easy
Your task is not alone!
It’s a race (condition)
Snap-in at your own risk

Dude, Where’s My Script

When you specify the path to your batch file in the Task element keep in mind that the path will be relative to the approot directory in a worker role and the approot/bin directory in a web role. It’s pretty easy to get mixed up with the relative paths, so consider using the rather cryptic %~dp0, which expands to the full path of the location where your batch file is running, whenever you need to reference additional files. Obviously, I didn’t take my own advice here, but this sample is pretty simple.

Copy Always

I can’t tell you how many times I’ve stumbled over this one! In the sample above, script.ps1 and setup.cmd aren’t really part of the project, they’re just tagging along with the deployment so they’ll be available in the VM on the cloud for the Azure Fabric to kick into gear. When you add an existing external file (or create a new one) in Visual Studio, the properties are set to not copy the file to the output directory. As a result such files won’t get packaged into the .cspkg file or delivered to Azure. Make sure you visit the Properties dialog of the script files you do add, and set Copy to Output Directory to “Copy Always”.

Two words, ExecutionPolicy

By default, Powershell will not run untrusted scripts, which is precisely what script.ps1 is. The first Powershell command in setup.cmd is there to set the execution policy to allow the next command to successfully run script.ps1. If the script runs under PowerShell 2.0 (that is, you’re deploying your role with an OSFamily setting of 2), you can get away with the following single command (which sets the policy for that one command versus globally):

powershell -ExecutionPolicy Unrestricted .\script.ps1

But Steve’s blog post says to use reg add HKLM\… Ultimately it’s the same result, and I like the fact I’m not poking directly into the registry and I can use the same script for both OSFamily values in Windows Azure.

Setting that policy makes me nervous is there no other way? Scott Hanselman wrote an extensive post on how to sign a Powershell script so it can run in a remote environment (like in your Windows Azure Web Role). That’s a bit out of the scope of what I want to cover here, so read it at your leisure. [Disclaimer: that post was written over four years ago, and I presume it’s still accurate, but I’ve not tried it in the context of Windows Azure.]

Logging is Your Friend

It may seem like a throwback, but logging each line of your scripts to figure out what when wrong and when it went wrong is about all you can do once you’re running in the cloud. In the setup.cmd file, you’ll notice I’ve (nod to Steve) used the stderr redirection 2>> to capture any errors when running the Powershell command itself. And in the Powershell script I’m using the Start-Transcript cmdlet to capture the output of each command in that script. The location of these log files is relative to the directory in which the script is run, which in the case above is /approot (worker role) or /approot/bin (web role). The next question, you’ll ask is “how do I get to them?” Read on!

Remote Desktop is Your BEST Friend

While you probably could figure out a way to push your log files to Azure storage or make them accessible via a WCF service call, or some other clever way, I say go straight to the source – the live VM running in Azure. With Remote Desktop access it’s simple to set up, and you’ll thank yourself for the visibility it gives you into diagnosing issues. Once you’re in the VM you can poke around at the various log files you’ve left behind and get a better idea of where things went awry.

“Simple” isn’t always easy

By “simple”, I mean the taskType of your Task element: simple, background, or foreground, as shown in the excerpt below:

 <ServiceDefinition name="WindowsAzureProject2" xmlns="https://schemas.microsoft.com/ServiceHosting/2008/10/ServiceDefinition">

  <WebRole name="WebRole1">

    <Startup priority="1">

      <Task commandLine="setup.cmd"

            executionContext="elevated"

            taskType="simple"/>

    </Startup>

...

Simple is as simple does Now, chances are you’ll want to use “simple;” it executes your task synchronously and doesn’t put your role in the ready state until the startup task completes. But, what happens if you have a logic error, say your task raises an exception or has an infinite loop? Since the role is never in the ready state, you won’t be able to remote into your instance to debug it.

I’d recommend running your tasks in “background” while you’re testing and debugging, and then switch back to “simple,” when you’re confident the code is rock-solid. As for “foreground”, that also runs your code asynchronously, but prevents the role from being recycled until the task completes, so if you do have a runaway task, you’ll have to hunt around in Task Manager to kill it first before you can restart your role and deploy your updated implementation.

Your Task is Not Alone!

if you’ve read up on some of the changes in the 1.3 SDK, you may be aware of a new plugin architecture, which makes enhanced capabilities – such as Diagnostics and RemoteAccess – easy to add or remove from your roles. You can see these module references in your ServiceDefinition.csdef file, and they are typically added by choices you’ve made in the properties of your roles, like selecting the Enable Diagnostics checkbox on the Role property sheet or clicking the link to configure Remote Desktop connections on the Cloud project publish dialog.

RemoteAccess snap-in directory The module references that then appear in the Service Definition document refer to plugins that are installed locally as part of the Azure 1.3 SDK, in the bin/plugins folder (see right).

If you open one of the csplugin files, you’ll notice it has a familiar look to it, essentially encapsulating behavior of an ancillary service or process you’re going to spin up in the cloud. It’s a separate process from your web and worker roles, but runs in the same VM and has many of the same parameters. Below is the code for the RemoteAccess module, which is required to be part of every web and worker role for which you want to enable Remote Desktop Access.

 <?xml version="1.0" ?>

<RoleModule

  xmlns="https://schemas.microsoft.com/ServiceHosting/2008/10/ServiceDefinition"

  namespace="Microsoft.WindowsAzure.Plugins.RemoteAccess">

  <Startup priority="-1">

    <Task commandLine="installRuntimeSnapIn.cmd" executionContext="elevated" taskType="background" />

    <Task commandLine="RemoteAccessAgent.exe" executionContext="elevated" taskType="background" />

    <Task commandLine="RemoteAccessAgent.exe /blockStartup" executionContext="elevated" taskType="simple" />

  </Startup>

  <ConfigurationSettings>

    <Setting name="Enabled" />

    <Setting name="AccountUsername" />

    <Setting name="AccountEncryptedPassword" />

    <Setting name="AccountExpiration" />

  </ConfigurationSettings>

  <Endpoints>

    <InternalEndpoint name="Rdp" protocol="tcp" port="3389" />

  </Endpoints>

  <Certificates>

    <Certificate name="PasswordEncryption" storeLocation="LocalMachine" storeName="My" permissionLevel="elevated" />

  </Certificates>

</RoleModule>

Note there’s a series of three tasks that are run on startup, two of which run asynchronously ( taskType=background ), and one of which is synchronous ( taskType=simple ). These tasks, along with tasks that you specify in your ServiceDefinition.csdef document are all thrown at the Azure Fabric to start up as it works to bring up your web or worker role. The priority in the Startup element here is –1, which means these tasks will start before your own tasks (since we left priority off and it defaults to 0, or perhaps 1?).

Now here’s where things get VERY interesting. The inclusion of the RemoteAccess module in the web role example above means that these three tasks will start before our own setup.cmd, but there is no guarantee the first two will complete before setup.cmd because they are marked as “background” (asynchronous).

Let’s take a look at what’s in that first installRuntimeSnapIn.cmd file now:

 rem Run both the 32-bit and 64-bit InstallUtil

IF EXIST %SystemRoot%\Microsoft.NET\Framework\v2.0.50727\InstallUtil.exe %SystemRoot%\Microsoft.NET\Framework\v2.0.50727\InstallUtil.exe Microsoft.WindowsAzure.ServiceRuntime.Commands.dll

IF EXIST %SystemRoot%\Microsoft.NET\Framework64\v2.0.50727\InstallUtil.exe %SystemRoot%\Microsoft.NET\Framework64\v2.0.50727\InstallUtil.exe Microsoft.WindowsAzure.ServiceRuntime.Commands.dll

rem Add the snapin to the default profile

echo Add-PSSnapIn Microsoft.WindowsAzure.ServiceRuntime >> %SystemRoot%\system32\WindowsPowerShell\Profile.ps1

powershell -command set-executionpolicy allsigned

What this command file does is install a Powershell snap-in that give some access to the runtime environment of the running role – more on that later. The very last action it takes looks kind of like the second line in our setup.cmd file, only it sets the policy to allsigned versus unrestricted… and

It’s a Race (Condition)

It's a Race

Our startup task (setup.cmd) ends up setting the policy to unrestricted to run the unsigned script (script.ps1) while the RemoteAccess script above may still be running in the background! . Both of these Set-ExecutionPolicy commands are ultimately updating a global registry entry, so the last one wins!

As a result, you can see a considerable variation in behavior. In my testing, I saw it work fine; I saw the Powershell script not even invoked (because the execution policy got reset between lines 2 and 3 of setup.cmd); and I saw my Powershell script start, only to choke in the middle because the execution policy had changed midstream, and one of the Powershell commands was requesting me to confirm – interactively - that it was ok to run!

The “fix” is easy – eliminate the race condition by setting the taskType of the installRuntimeSnapIn.cmd to “simple” rather than “background”. I suppose deleting that last line from the .cmd file would work as well, but someone put it there for a reason, and I didn’t feel confident questioning that. In terms of switching to “simple,” I’m fine with it.. Maybe it takes a wee bit longer for my role to start up, but that’s nothing compared to the loss of two days and a bit of sanity I otherwise incurred.

Powershell Guru: “Hey, you know there’s an easier way?” If you read Steve’s post , you’ll note he calls out the following for invoking a Powershell script from his .cmd file:

powershell -ExecutionPolicy Unrestricted ./myscript.ps1

but he also adds that this works for osFamily=”2”, which is an Azure OS based off of Windows Server 2008 R2 and so comes with Powershell 2.0. The default osFamily is “1”, and that provides an image based off of Windows Server 2008 SP2, which comes with Powershell 1.0, and you guessed it, the –ExecutionPolicy switch wasn’t introduced until Powershell 2.0. That switch also affects the session and not the local machine setting, so there is no race condition created there to fix!

Snap-in At Your Own Risk

One of the tips Steve provides is leveraging the Azure Service Runtime from Powershell, using the nifty snap-in that’s installed from the Remote Access module. That’s a great idea in theory, but after some back and forth, I’d recommend against it. Here’s my rationale:

The snap-in is only installed as part of the RemoteAccess module, it’s not part of the VM image in Azure. Unless you’re planning to always deploy your roles with RemoteAccess enabled (which I wouldn’t advise given the additional attack vector it may provide) then you wouldn’t have the plug-in available.
As explained to me, the primary scenario for the snap-in is to be able to peer into (and perhaps fix) misbehaving instances, making its correlation with RemoteAccess clear. Its use with your own Powershell startup scripts isn’t currently supported.
You can still access much of the Azure service runtime methods (after all the snap-in really just provides syntactic sugar). For example, the following sets of Powershell commands are equivalent:

Add-PSSnapin Microsoft.WindowsAzure.ServiceRuntime Get-LocalResource -Name FoldingClientStorage

[Reflection.Assembly]::LoadWithPartialName("Microsoft.WindowsAzure.ServiceRuntime") [Microsoft.WindowsAzure.ServiceRuntime.RoleEnvironment]:: GetLocalResource("FoldingClientStorage")