Slow AppFabric demo part two

In the previous article, we began analyzing the first portion of the timeline, which is where IIS starts up the process. In this, I pointed out that 16% of the sampled time was spent in page faults. Sampling does not always give us the whole story though. If the process is switched out, then samples will not capture where that time went. Switch out time can include waiting for I/O or waiting on another thread. If we examine the time switched out on page faults, that comes to a total of 2.5 seconds during this time period.

Another large portion of time is going to a GlobalNotify function. The butterfly view looks like this:

 
100%        |iiscore.dll!W3_SERVER::GlobalNotify Callers
100%        | iiscore.dll!W3_APPLICATION::ResolveModules
100%        |  iiscore.dll!W3_APPLICATION::SetupNotificationContext
100%        |  => iiscore.dll!W3_CONTEXT::SetupStateMachinePhase2
100%        |  => iiscore.dll!W3_CONTEXT::SetupStateMachine
100%        |  => iiscore.dll!W3_MAIN_CONTEXT::OnNewRequest
100%        |  => w3dt.dll!UL_NATIVE_REQUEST::DoWork
100%        |iiscore.dll!W3_SERVER::GlobalNotify Calls
 18%         |webengine4.dll!CMgdEngGlobalModule::OnGlobalApplicationResolveModules
 18%         | webengine4.dll!InitClrHost
 18%         |  => webengine4.dll!GetIsapiProcessHost
 14%         |     |clr.dll!ClrCreateManagedInstance
 81%         |webengine4.dll!W3_MGD_APP_CONTEXT::OnApplicationStart
 59%          |webengine4.dll!W3_MGD_APP_CONTEXT::CreateAppDomain
 59%          | clr.dll!GenericComCallStub
 59%          |  clr.dll!COMToCLRWorker
 58%          |  => System.Web.ni.dll!System.Web.Hosting.ApplicationManager::CreateAppDomainWithHostingEnvironment
 21%          |webengine4.dll!W3_MGD_APP_CONTEXT::InitializeMgdApplication
 21%            clr.dll!GenericComCallStub
 21%             clr.dll!COMToCLRWorker
 20%             => System.Web.ni.dll!System.Web.HttpApplication::RegisterEventSubscriptionsWithIIS
 17%                |System.Web.ni.dll!System.Web.HttpApplication::GetModuleCollection

Notice that most of the time is going to creating the AppDomain and initializing the managed application. There is nothing abnormal going on here, it is just taking a long time because of the slow disk access.

The next section to focus on is highlighted here:

14.38s marks the time that the web host request starts. At 19.92s, the XamlServices.Load operation starts. The WCF/WF ETW events show the duration of the times for HostedTransportConfigurationManager initialization and ServiceHostFactory creation. They take 0.65s and 0.00001s respectively, so there doesn't seem to be a problem in the WCF/WF initialization. Most of the rest of the time goes into loading assemblies and page faulting as before.

The XamlServices.Load operation happens next and takes 0.82 seconds. 33% of that time goes to page faulting, the rest goes to the System.Xaml.XamlServices.Transform method. Here is a stack trace of that method.

 
63%      | ...System.Xaml.ni.dll!System.Xaml.XamlServices::Transform
 9%      |    |System.Xaml.ni.dll!System.Xaml.XamlObjectWriter::WriteStartMember
 5%      |    ||System.Xaml.ni.dll!System.Xaml.XamlObjectWriter::Logic_CreateAndAssignToParentStart
43%      |    |System.Xaml.ni.dll!System.Xaml.XamlObjectWriter::WriteEndMember
41%      |    ||System.Xaml.ni.dll!System.Xaml.XamlObjectWriter::Logic_CreatePropertyValueFromValue
39%      |    |||System.Xaml.ni.dll!System.Xaml.XamlObjectWriter::Logic_CreateFromValue
39%      |    ||| ...System.Xaml.ni.dll!MS.Internal.Xaml.Runtime.ClrObjectRuntime::CreateObjectWithTypeConverter
35%      |    |||    |System.Activities.ni.dll!Microsoft.VisualBasic.Activities.XamlIntegration.VisualBasicSettingsConverter::ConvertFrom
35%      |    |||    | System.Activities.ni.dll!Microsoft.VisualBasic.Activities.XamlIntegration.VisualBasicExpressionConverter::CollectXmlNamespacesAndAssemblies
 4%      |    |||    | |System.Activities.ni.dll!Microsoft.VisualBasic.Activities.XamlIntegration.VisualBasicExpressionConverter+AssemblyCache::EnsureInitialized
 3%      |    |||    | | System.Activities.ni.dll!Microsoft.VisualBasic.Activities.XamlIntegration.VisualBasicExpressionConverter+AssemblyCache::CacheLoadedAssembly
 3%      |    |||    | |  Other
 2%      |    |||    | |clr.dll!JIT_GetSharedGCStaticBase_Helper
 2%      |    |||    | | ...System.Activities.ni.dll!Microsoft.VisualBasic.Activities.XamlIntegration.VisualBasicExpressionConverter::.cctor
 2%      |    |||    | |     Other
26%      |    |||    | |System.ni.dll!System.Text.RegularExpressions.Regex::Match
26%      |    |||    | | System.ni.dll!System.Text.RegularExpressions.Regex::Run
26%      |    |||    | |  System.ni.dll!System.Text.RegularExpressions.RegexRunner::Scan
21%      |    |||    | |  |clr.dll!ThePreStubAMD64
21%      |    |||    | |  | ...clr.dll!MethodDesc::MakeJitWorker
21%      |    |||    | |  |     clr.dll!UnsafeJitFunction
21%      |    |||    | |  |      ...clrjit.dll!THX_dop2
 4%      |    |||    | |  |         |clrjit.dll!ReaderBase::MSILToIR
 8%      |    |System.Xaml.ni.dll!System.Xaml.XamlXmlReader::Read
 8%      |      System.Xaml.ni.dll!MS.Internal.Xaml.NodeStreamSorter::MoveNext
 4%      |      |System.Xaml.ni.dll!MS.Internal.Xaml.NodeStreamSorter::ReadAheadAndSortCtorProperties
 2%      |      | System.Xaml.ni.dll!MS.Internal.Xaml.NodeStreamSorter::ReadAheadToEndOfAttributes
 2%      |      |  ...System.Xaml.ni.dll!MS.Internal.Xaml.Parser.XamlPullParser+<p_elementbody>d__23::MoveNext
 2%      |      |      Other
 4%      |      |System.Xaml.ni.dll!MS.Internal.Xaml.Parser.XamlPullParser+<parse>d__0::MoveNext

Notice the cost for JITting the regular expression. This is why it's important to compile regular expressions staticly. You don't want to incur this penalty on every request.

The next significant chunk of time is when the workflow service host is created. The time period is highlighted below:

26% of this time is spent in page faults. The rest of the time is shown here:

 
70%      | ...System.ServiceModel.ni.dll!System.ServiceModel.ServiceHostBase::InitializeDescription
53%      |    |System.ServiceModel.Activities.ni.dll!System.ServiceModel.Activities.WorkflowServiceHost::CreateDescription
53%      |    | System.ServiceModel.Activities.ni.dll!System.ServiceModel.Activities.WorkflowService::GetContractDescriptions
52%      |    |  System.ServiceModel.Activities.ni.dll!System.ServiceModel.Activities.WorkflowService::WalkActivityTree
52%      |    |  => System.Activities.ni.dll!System.Activities.WorkflowInspectionServices::CacheMetadata
52%      |    |  => ...System.Activities.ni.dll!System.Activities.ActivityUtilities::ProcessActivity
52%      |    |  => System.Activities.ni.dll!System.Activities.Activity::InternalCacheMetadata
43%      |    |     |System.Activities.ni.dll!System.Activities.Activity`1[System.__Canon]::OnInternalCacheMetadata
43%      |    |     | System.Activities.ni.dll!System.Activities.CodeActivity`1[System.__Canon]::OnInternalCacheMetadataExceptResult
43%      |    |     |  System.Activities.ni.dll!Microsoft.VisualBasic.Activities.VisualBasicValue`1[System.__Canon]::CacheMetadata
43%      |    |     |   System.Activities.ni.dll!Microsoft.VisualBasic.Activities.VisualBasicHelper::Compile
40%      |    |     |   |System.Activities.ni.dll!Microsoft.VisualBasic.Activities.VisualBasicHelper::Compile
 5%      |    |     |   ||clr.dll!ThePreStubAMD64
 5%      |    |     |   || ...clr.dll!Module::FixupNativeEntry
 5%      |    |     |   ||     clr.dll!CEECompileInfo::DecodeModuleFromIndexes
 5%      |    |     |   ||     => clr.dll!Module::LoadAssembly
35%      |    |     |   ||System.Activities.ni.dll!Microsoft.VisualBasic.Activities.VisualBasicHelper::Compile
33%      |    |     |   |  Microsoft.VisualBasic.Activities.Compiler.ni.dll!Microsoft.Compiler.VisualBasic.HostedCompiler::CompileExpression
33%      |    |     |   |   ...Microsoft.VisualBasic.Activities.Compiler.ni.dll!<module>::VbHostedCompiler.CompileExpression
29%      |    |     |   |      |Microsoft.VisualBasic.Activities.Compiler.ni.dll!<module>::VbHostedCompiler.InitCompiler
14%      |    |     |   |      ||Microsoft.VisualBasic.Activities.Compiler.ni.dll!<module>::VbHostedCompiler.GetCompiler
14%      |    |     |   |      || ...clr.dll!DoNDirectCall__PatchGetThreadCall
 4%      |    |     |   |      ||    |Microsoft.VisualBasic.Activities.Compiler.dll!VBCreateBasicCompiler
 3%      |    |     |   |      ||    | Microsoft.VisualBasic.Activities.Compiler.dll!Compiler::CreateNewCompilerHost
 3%      |    |     |   |      ||    |  Microsoft.VisualBasic.Activities.Compiler.dll!Compiler::LoadDefaultLibraries
 3%      |    |     |   |      ||    |   Other
10%      |    |     |   |      ||    |Microsoft.VisualBasic.Activities.Compiler.dll!CompilerProject::Compile
15%      |    |     |   |      ||Microsoft.VisualBasic.Activities.Compiler.ni.dll!<module>::VbHostedCompiler.SetupCompilerProject
13%      |    |     |   |      |  clr.dll!DoNDirectCallWorker
13%      |    |     |   |      |   Microsoft.VisualBasic.Activities.Compiler.dll!Compiler::CompileToBound
16%      |    |System.ServiceModel.ni.dll!System.ServiceModel.ServiceHostBase::ApplyConfiguration
11%      |     |System.ServiceModel.ni.dll!System.ServiceModel.Description.ConfigLoader::LookupService
 4%      |     |System.ServiceModel.ni.dll!System.ServiceModel.Description.ConfigLoader::LoadServiceDescription

VB expression compilation contributes a significant amount to the cost. There are some ways to get around VB expressions. If you were writing the workflow in code, you could use a Lambda expression in its place. There is also the possibility of writing a custom expression, but these run slower than VB expressions. C# expressions will be coming soon to WF, and that will have different performance characteristics than VB. The reason VB expression compilation is heavy is because the native VB compiler is used and it needs the assemblies, variables, etc. defined in an environment before it can compile the expression into an expression tree. Luckily, this is only a latency cost on deserializing the workflow. It will not affect the throughput of workflow instances. It's better to just use VB expressions.

The next part to examine is the service host open operation, which occurs during the time period highlighted below.

The stack trace for this time period looks like this:

 
35%    || ...System.ServiceModel.ni.dll!System.ServiceModel.Channels.CommunicationObject::Open
18%    ||    |System.ServiceModel.Activities.ni.dll!System.ServiceModel.Activities.WorkflowServiceHost::OnOpeningCore
16%    ||    | System.ServiceModel.Activities.ni.dll!System.ServiceModel.Activities.WorkflowServiceHost::InitializeRuntime
15%    ||    |  System.ServiceModel.ni.dll!System.ServiceModel.ServiceHostBase::InitializeRuntime
15%    ||    |   System.ServiceModel.ni.dll!System.ServiceModel.Description.DispatcherBuilder::InitializeServiceHost
 5%    ||    |   |System.ServiceModel.Activities.ni.dll!System.ServiceModel.Activities.Description.EtwTrackingBehavior::ApplyDispatchBehavior
 5%    ||    |   | System.ServiceModel.Activities.ni.dll!System.ServiceModel.Activities.Tracking.DefaultProfileManager::GetProfile
16%    ||    |System.ServiceModel.Activities.ni.dll!System.ServiceModel.Activities.WorkflowServiceHost::OnOpen
15%    ||      System.ServiceModel.Activities.ni.dll!System.ServiceModel.Activities.Dispatcher.DurableInstanceManager::Open
 3%    ||      |System.Transactions.ni.dll!System.Transactions.TransactionScope::.ctor
 3%    ||      | ...clr.dll!Module::LoadModule
 2%    ||      |     clr.dll!AppDomain::LoadDomainModule
12%    ||      |System.Activities.DurableInstancing.ni.dll!System.Activities.DurableInstancing.SqlWorkflowInstanceStore::OnNewInstanceHandle
11%    ||        System.Activities.DurableInstancing.ni.dll!System.Activities.DurableInstancing.SqlWorkflowInstanceStore::MakeReadOnly
 8%    ||        |System.Data.ni.dll!System.Data.Common.DbConnectionStringBuilder::set_ConnectionString
 8%    ||        | ...System.Data.ni.dll!System.Data.Common.DbConnectionOptions::ValidateKeyValuePair
 8%    ||        |     System.ni.dll!System.Text.RegularExpressions.Regex::IsMatch
 8%    ||        |      ...clrjit.dll!THX_dop2
 2%    ||        |         |clrjit.dll!ReaderBase::MSILToIR
 2%    ||        |System.Data.ni.dll!System.Data.Common.DbConnectionStringBuilder::get_ConnectionString
 2%    ||          ...clrjit.dll!PreJit::compileMethod
 2%    ||              clrjit.dll!THX_dop2
10%    ||ntoskrnl.exe!KiPageFault
10%    |  ntoskrnl.exe!MmAccessFault
 9%    |   ntoskrnl.exe!MiIssueHardFault
43%    |tid ( 1276)
25%    ||ntdll.dll!RtlUserThreadStart
25%    || ...clr.dll!ThreadpoolMgr::CompletionPortThreadStart
25%    ||     clr.dll!BindIoCompletionCallbackStub
25%    ||     => ...System.Runtime.DurableInstancing.ni.dll!System.Runtime.IOThreadScheduler+ScheduledOverlapped::IOCallback
24%    ||     => System.Activities.DurableInstancing.ni.dll!System.Activities.DurableInstancing.SqlWorkflowInstanceStoreAsyncResult::StartOperation
11%    ||        |System.Activities.DurableInstancing.ni.dll!System.Activities.DurableInstancing.CreateWorkflowOwnerAsyncResult::GenerateSqlCommand
10%    ||        | System.Activities.DurableInstancing.ni.dll!System.Activities.DurableInstancing.SerializationUtilities::SerializePropertyBag
 3%    ||        | |System.Activities.DurableInstancing.ni.dll!System.Activities.DurableInstancing.DefaultObjectSerializer::SerializeValue
 7%    ||        | |System.Activities.DurableInstancing.ni.dll!System.Activities.DurableInstancing.DefaultObjectSerializer::SerializePropertyBag
13%    ||        |System.Activities.DurableInstancing.ni.dll!System.Activities.DurableInstancing.SqlCommandAsyncResult::StartCommandInternal
10%    ||         |System.Data.ni.dll!System.Data.SqlClient.SqlConnection::Open
10%    ||           ...System.Data.ni.dll!System.Data.ProviderBase.DbConnectionFactory::GetConnection
 9%    ||               System.Data.ni.dll!System.Data.ProviderBase.DbConnectionPool::GetConnection
 9%    ||               => ...System.Data.ni.dll!System.Data.SqlClient.SqlInternalConnectionTds::.ctor
 9%    ||               => System.Data.ni.dll!System.Data.SqlClient.SqlInternalConnectionTds::OpenLoginEnlist
 9%    ||               => System.Data.ni.dll!System.Data.SqlClient.SqlInternalConnectionTds::LoginNoFailover
 2%    ||                  |System.Data.ni.dll!System.Data.SqlClient.TdsParser::.ctor
 2%    ||                  | ...System.Data.dll!Ssl::Initialize
 2%    ||                  |     System.Data.dll!Ssl::AcquireCredentialsForClient
 2%    ||                  |     => ...sspicli.dll!AcquireCredentialsHandleCommon
 5%    ||                  |System.Data.ni.dll!System.Data.SqlClient.SqlInternalConnectionTds::AttemptOneLogin
 2%    ||                   |System.Data.ni.dll!System.Data.SqlClient.TdsParser::Connect:1
 2%    ||                   | ...System.Data.dll!SNISecInitPackage
 6%    ||ntoskrnl.exe!KiPageFault
 5%    || ntoskrnl.exe!MmAccessFault
 5%    ||  ntoskrnl.exe!MiIssueHardFault
 3%    |tid ( 2132)
 2%    | ntoskrnl.exe!KiPageFault

Besides page faults and assembly loading, a lot of time is spent setting up a connection to SQL server and initializing the SQL workflow instance store provider. This is another one-time cost.

After this analysis, there are really only two action items that needed to be taken.

  1. Move the demo to a machine with faster disk access. The slow disk access on the virtual machine used for this demo exacerbated the sluggish cold startup. Running this on a physical machine or a native VHD boot would probably take care of it.
  2. Prep the demo machine before giving the demo. In this case, the audience was subjected to waiting for IIS, WCF, and WF to startup. By adding a dummy WCF service and hitting the WSDL, a lot of this cost could be avoided. Also, AppFabric has the ability to prep a service so that the first user to make a call to a service doesn't have to pay the penalty for all the startup time. Creating a dummy xamlx could improve things even further because it starts the workflow service host and sets up providers like persistence and tracking. RunningĀ a second xamlx was measured to only take 0.27 seconds.

If part of the demo is to write the xamlx code and drop it into the directory, the application pool will be recycled. There is not much that can be done about this, but it will still help to prep the demo machine because it will load the .Net assemblies into shared memory and cut down on the loading costs for subsequent processes using the .Net framework.