Investigating CRM V4 Performance


The purpose of this blog is to investigate in detail a benchmark run conducted with the CRM V4 Performance toolkit (see http://blogs.msdn.com/crm/archive/2008/02/29/microsoft-dynamicstm-crm-4-0-performance-toolkit.aspx or www.codeplex.com/crmperftoolkit).  My intent is to provide you with:

a. Results from a run conducted in our performance labs, including details of the environment itself; this will provide a reference point to calibrate new benchmark setups.

b. The statistics we gather during the benchmark runs

c. Insight into how these statistics can be used to find and eliminate bottlenecks.

Hardware

Most of our performance benchmarking is done on a five server system which includes one domain controller, one SQL server, one application server, one asynchronous server, and one client machine.  The benchmark runs done for this discussion were done on the following hardware:

  • Domain Controller – Dell 2850
  • SQL server – HP Proliant DL585 with 4 AMD Opteron 850 dual core processors, 32G of RAM,
    • using 5 internal drives for the OS (RAID - 0)
    • HP Storage Works MSA30 using a smart array 6400 controller, 7 drives for data, 3 drives for logs, 2 drives for temp, 2 drives for backups (all stripes RAID – 0)
  • Asynchronous server – HP Proliant DL585 with 4 AMD Opteron 850 dual core processors, 32G of RAM
  • Application server – HP Proliant DL585 with 4 AMD Opteron 850 dual core processors, 8G of RAM
  • Client machine – Desktop with Intel P4 3.2 MHz single core processor with HyperThreading, 2G of RAM

Configuration

All machines had Microsoft Windows Server 2003 Enterprise x64 Edition SP2, except for the Domain Controller and the Client machine which had Microsoft Windows Server 2003 Enterprise Edition SP2.  All database logical drives had their Windows allocation unit size set to 64K and all machines were on a gigabit network switch with gigabit NIC’s.

The SQL server is running SQL Server 2005 x64 Enterprise edition.  The Microsoft CRM database, its log file, and the temp database are all on dedicated logical and physical devices to get better throughput from SQL.  We also set Max Degree of Parallelism to 1 as SQL Server does not produce efficient parallel execution plans for the CRM workload, which can cause deadlocks and other performance degradations.  Lastly SQL Server SQL Server 2005 deadlock tracing (T1222 and T1204) is enabled to allow us to better understand deadlocks encountered during the runs.

The Application (App) server has a basic install of Microsoft Dynamics CRM V4, with the Microsoft CRM Asynchronous Processing Service disabled (since we are using a dedicated Asynchronous server), as well as having CRM tracing configured to just catch errors (registry setting TraceCategories = *:Off;Platform.*:Error).

The Asynchronous (Async) server has a basic install of Microsoft Dynamics CRM V4 (pointing it to the existing Database from the install on the Application server).  Error tracing is enabled on this server just as it is on the App server.

The Client Machine had Microsoft Visual Studio 2005 Team System and the Microsoft Dynamics CRM 4.0 Performance Toolkit installed.  This is the same toolkit we built and used during the CRM 4 development cycle.  To download and find more information about the toolkit visit www.codeplex.com/crmperftoolkit.  We configure the load test to gather performance counters for the machines in our environment along with the standard benchmark statistics (like response time, user load, etc.).  For more information on adding perfmon counters to the load test see How to: Specify Counter Sets.  You can also get perfmon statistics from the App, Async, and SQL servers using the scripts below to add a counter set for each machine (note you have to update the bracketed parameters for the script to run correctly, we usually set our sample to 90 seconds on an hour run).

App and Async Servers:

logman -y create counter web_perf_log -o [SystemDrive]\Performance\Perfmons\[webperfmonname] --v -si [SAMPLE] -c "\.NET CLR Exceptions(*)\*" "\.NET CLR Interop(*)\*" "\.NET CLR Jit(*)\*" "\.NET CLR Loading(*)\*" "\.NET CLR LocksAndThreads(*)\*" "\.NET CLR Memory(*)\*" "\.NET CLR Remoting(*)\*" "\.NET CLR Security(*)\*" "\Active Server Pages\*" "\ASP.NET Applications(*)\*" "\ASP.NET Apps v1.1.4322(*)\*" "\ASP.NET Apps v2.0.50727(*)\*" "\ASP.NET State Service\*" "\ASP.NET v1.1.4322\*" "\ASP.NET v2.0.50727\*" "\ASP.NET\*" "\Browser\*" "\Cache\*" "\CrmPrfCntrs\*" "\Distributed Transaction Coordinator\*" "\eTrust Antivirus Realtime Server(*)\*" "\ICMP\*" "\ICMPv6\*" "\Indexing Service Filter(*)\*" "\Indexing Service(*)\*" "\Internet Information Services Global\*" "\IPv4\*" "\IPv6\*" "\Job Object Details(*)\*" "\Job Object(*)\*" "\LogicalDisk(*)\*" "\Memory\*" "\NBT Connection(*)\*" "\Network Interface(*)\*" "\NNTP Commands(*)\*" "\NNTP Server(*)\*" "\Objects\*" "\Paging File(*)\*" "\PhysicalDisk(*)\*" "\Print Queue(*)\*" "\Process(*)\*" "\Processor(*)\*" "\RAS Port(*)\*" "\RAS Total\*" "\Redirector\*" "\Server Work Queues(*)\*" "\Server\*" "\SMTP NTFS Store Driver(*)\*" "\SMTP Server(*)\*" "\System\*" "\TCPv4\*" "\TCPv6\*" "\Thread(*)\*" "\UDPv4\*" "\UDPv6\*" "\Web Service Cache\*" "\Web Service(*)\*" "\WMI Objects\*"

SQL Server:

logman -y create counter sql_perf_log -o [SystemDrive]\Performance\Perfmons\[sqlperfmonname] --v -si [SAMPLE] -c "\LogicalDisk(*)\*" "\Memory\*" "\MSAS 2005:Cache\*" "\MSAS 2005:Connection\*" "\MSAS 2005:Data Mining Model Processing\*" "\MSAS 2005:Locks\*" "\MSAS 2005:MDX\*" "\MSAS 2005:Memory\*" "\MSAS 2005:Proactive Caching\*" "\MSAS 2005:Proc Aggregations\*" "\MSAS 2005:Proc Indexes\*" "\MSAS 2005:Processing\*" "\MSAS 2005:Storage Engine Query\*" "\MSAS 2005:Threads\*" "\MSFTESQL:Catalogs(*)\*" "\MSFTESQL:Indexer PlugIn(*)\*" "\MSFTESQL:Service\*" "\MSRS 2005 Web Service(*)\*" "\MSRS 2005 Windows Service(*)\*" "\Objects\*" "\Paging File(*)\*" "\PhysicalDisk(*)\*" "\Process(*)\*" "\Processor(*)\*" "\Server Work Queues(*)\*" "\Server\*" "\SQLAgent:Statistics\*" "\SQLServer:Access Methods\*" "\SQLServer:Broker Activation(*)\*" "\SQLServer:Broker Statistics\*" "\SQLServer:Broker/DBM Transport\*" "\SQLServer:Buffer Manager\*" "\SQLServer:Buffer Node(*)\*" "\SQLServer:Buffer Partition(*)\*" "\SQLServer:Catalog Metadata(*)\*" "\SQLServer:CLR\*" "\SQLServer:Cursor Manager by Type(*)\*" "\SQLServer:Cursor Manager Total\*" "\SQLServer:Databases(*)\*" "\SQLServer:Exec Statistics(*)\*" "\SQLServer:General Statistics\*" "\SQLServer:Latches\*" "\SQLServer:Locks(*)\*" "\SQLServer:Memory Manager\*" "\SQLServer:Plan Cache(*)\*" "\SQLServer:SQL Errors(*)\*" "\SQLServer:SQL Statistics\*" "\SQLServer:SSIS Pipeline\*" "\SQLServer:Transactions\*" "\SQLServer:User Settable(*)\*" "\SQLServer:Wait Statistics(*)\*" "\System\*" "\TCPv4\*" "\TCPv6\*" "\Thread(*)\*" "\Web Service Cache\*" "\Web Service(*)\*" "\WMI Objects\*"

Benchmark Runs

Being on the performance team we make numerous benchmark runs each week to gather data points for historical analysis as well as to identify areas for potential deep dive investigations.  Below are results from one such run made using the LoadTestOutlookDM.loadtest workload (which is included in the performance toolkit).  This run simulates a 500 user workload with complex workflows, and was intentionally designed to stress the system.  Where relevant, I’ll mention results from the exact same load test, but without the complex workflows, to show how the toolkit can be used to carry out comparative analysis around some change in the workload.  The LoadTestOutlookDM loadtest contains 16 different scenarios that include tests that cover Reports, Outlook, Data Management, Service Management, random email generation, as well as basic crud operations.  For the remainder of this blog we will discuss the results in detail and review the process we use to determine the key performance characteristics of the system.

Step 1: Understanding the Visual Studio result windows

After the run completes you’ll be presented with a screen that should look similar to this:

clip_image001

Figure 1: Graphical view.

The key elements in this window are:

  • The graph which displays the counter values (perfmon and benchmark stats) over the course of the run. Use the Counters pane to control which stats are displayed.
  • The tabular display of the counter values which displays the Min, Max, and Average values of the stats (note that the slider allows you to zoom in on particular sections of the run)
  • The summary pane which shows a few high level stats about the overall run
  • Buttons for displaying graph or table view

Clicking on the Tables button will bring up this:

clip_image002

Figure 2: Table view.

The key element in this window is the table which can display one of seven categories.  The categories we normally look at are:

  • Tests – this displays the test and scenario name, along with various stats about the execution of the test. Tests map to webtests in the toolkit, and are the fundamental building blocks of any workload.
  • Transactions – Each test above can contain zero or more explicitly measured “transactions” (not to be confused with database transactions).  Most of the tests included with the toolkit consist of a sequence of user actions – for example, the CreateNewAccount test will include NavAccountHomepage and CreateAccount.  Note that the same transaction name can appear in multiple tests, you can see exactly which test and scenario a transaction was recorded for in the table.
  • Errors – this displays the type of error hit, the subtype of the error hit, the number of times that error was hit (with a link to more detail about those errors), and the last message of the last error hit.
  • Requests – this displays the request name, the scenario it was in, the test(s) it happened in, the total number of times the request was made, the number of failures on that request (with a link to more detail about those errors), the response time of that request, and the content length of that request.  Just like transactions, the same request can come from different scenarios, tests, and transactions – it can be important to pay attention to these details because different tests can access the same page in different manners (and have different performance characteristics).

Step 2: Digesting the results

To start we pull up some counter values from the relevant machines (either through perfmon or Visual Studio if you added the machines to your load test’s “Counter Set Mappings” area).  Since we were running workflow on this run we’ll look at the Async, Web, and SQL servers.  First thing I do is check the following counters: Total Requests, Total Tests, and Failed Tests from the “Overall” counter on the client, the “% Processor Time” (for all machines) under the “Computer” object, the “Batch Requests/sec” from the SQL Servers “SQLServer:SQL Statistics” object, then lastly I add nine counters (because I have the SQL drives split up for better throughput, so one for each drive (SQL data, SQL log, and the temp database) from the SQL Servers “PhysicalDisk” object, they are “Avg. Disk Queue Length”, “Disk Transfers/sec”, and “Disk Bytes/sec” counters (refer to Figure 1 as an example).

The first thing I noticed is the high number of failed tests; in this case just below 6% of tests failed (we consider anything above 1% as an investigation point).   Next I look at the Requests/Sec and Avg. Response Time they aren’t that bad (9.2 and .5 respectively), but there was a slight increase in the Response Time .3 to .5 from my earlier run without Complex Workflows.  Since each user action in CRM typically involves at least a few requests to the server, small differences in the average request time across all requests can translate to noticeable increases in user perceived performance.  I check the % Processor Time on all three systems to make sure something didn’t go wrong on one of my App servers.  They are all fairly busy, but nothing they shouldn’t be able handle.  If they were up in the 65-70% average I’d be looking at offloading some of the work onto another server. 

Nothing jumped out after reviewing the high level system stats, so I’m going to look at the SQL Server in detail.  Microsoft CRM has a very complex schema, and experience has taught us that most of CRM’s performance characteristics stem from the performance of the underlying database.  For starters I like to check the counters “Batch Request/sec” and a few physical disk counters to see if that is where the bottleneck resides.  Sure enough the Batch Request/sec is close to twice as busy as the earlier run without workflow and the Avg. Disk Queue Length and Disk Bytes/sec are both significantly higher, but I also notice that the Last counter read for Ave Disk Queue Length and Disk Bytes/sec were significantly high that this server had a lot of things to process late in the cycle or even after the run was over which points to Async processes “workflow”.

The Benchmark Toolkit will automatically make a backup of the AsyncOperationBase table after the Benchmark, so you can query Asynchronous operations that ran during your benchmark run.  The tool also creates a workflow stats files that is saved out in your test results out directory, but I like to keep track of the async operation type as well as get my hands dirty.  The table will be saved off under the master database as WorkfowStats_[your load test name]_[the time load test was started]_[organization name]_MSCRM.  Edit the database name as well as the startedon and completedon of the following query, and then use this query against the above database to extract the Asynchronous operations that ran during your benchmark.

select name,

              OperationType,

min(datediff(second ,startedon, completedon)) as [min],

max(datediff(second ,startedon, completedon))as [max],

avg(datediff(second ,startedon, completedon)) as [avg],

count(*) as [count]

from dbo.WorkfowStats_LoadTestOutlookDM2_23_2008_1_10_07_AM_perfv4RTM_MSCRM

where completedon is not null

and

              startedon > '2/23/2008 9:10:00 AM'

and

              completedon < '2/23/2008 10:11:00 AM'

group by name, OperationType

It will return something like this.

Name

OperationType

min

max

avg

count

Workflow expansion task

1

0

96

38

13761

importlead633393282086689699

3

36

36

36

1

importcontact633393286571064699

3

137

137

137

1

importcontact633393286571064699

4

54

54

54

1

importlead633393282086689699

4

6

6

6

1

importlead633393282086689699

5

170

170

170

1

MyPropagateByExpression BulkOperation

6

2

72

20

6

cross entity contact and account rule

7

72

72

72

1

single entity complex rule

7

128

128

128

1

Inactivity Alert – incident

10

5

195

71

32

Create Record Alert – opportunity

10

1

163

28

44

Lead Routing Scenario

10

1

196

126

2014

Inactivity Alert – lead

10

6

199

101

20

Assignment Alert – appointment

10

12

183

74

155

Assignment Alert – fax

10

12

183

76

155

Create Record Alert – lead

10

0

198

127

2015

Assignment Alert – phonecall

10

12

194

78

155

Assignment Alert – contact

10

18

185

76

152

Assignment Alert – task

10

15

196

77

3708

Assignment Alert – invoice

10

12

166

74

155

Case Routing Scenario

10

1

140

16

79

Assignment Alert – lead

10

0

171

41

159

Inactivity Alert – opportunity

10

5

236

107

28

Assignment Alert – letter

10

12

185

77

155

Assignment Alert – opportunity

10

1

198

78

2764

Assignment Alert – incident

10

0

171

65

587

MyPropagateByExpression BulkOperation

11

2

81

17

40

Leads

12

1

53

7

10

Accounts

12

1

28

6

10

Contacts

12

1

74

11

10

As you can see there were a lot of Asynchronous operations running during that hour run, 12,377 were from the workflows we defined.  Let see how much data was added to the asyncoperationbase table from before and after this run.  Most of these queries I’m putting down here I run before and after the benchmark runs to get a better idea what is happening to the system, this often allows me to catch bottlenecks and degradations very easily and once I find the root I can work my way back up to make system configuration changes or database changes or create bugs which will improve design of that particular feature that may be causing disruptions in flow.

This query will save off the table size information (i.e. page counts) into a database (500UserRTMStats) that you’ll have to create beforehand (make sure you run this against your organization database).  This will allow you to compare any CRM database size before and after each run.

select

cast (object_name(p.object_id) as char(40)) as 'object_name', cast (i.name as char(40)) as 'index_name', p.in_row_used_page_count, p.lob_used_page_count, p.used_page_count

into [500UserRTMStats]..[OrgTableSizeAfterLoadTestOutlookDM]

from sys.dm_db_partition_stats p join sys.indexes i on

p.object_id = i.object_id and p.index_id = i.index_id

order by used_page_count desc

As mentioned before, here I’m interested in the AsyncOperationBase table to see how much in size that table grew.

From before the run:

SELECT *

FROM dbo.OrgTableSizeBeforeRuns

WHERE object_name LIKE 'AsyncOperationBase'

object_name

index_name

in_row_

used_

page_count

lob_used

_page_count

used_

page_count

AsyncOperationBase                     

cndx_PrimaryKey

_AsyncOperation         

381

0

381

AsyncOperationBase                     

ndx_SystemManaged                      

51

0

51

AsyncOperationBase                     

ndx_Cover_Async

Operation               

41

0

41

AsyncOperationBase                     

ndx_RequestId_Async

Operation           

38

0

38

AsyncOperationBase                     

ndx_RegardingObjectId

_AsyncOperation   

36

0

36

AsyncOperationBase                     

ndx_StartedOn_Async

Operation           

33

0

33

As well as the size of the table after the run:

SELECT *

FROM dbo.OrgTableSizeAfterLoadTestOutlookDM

WHERE object_name LIKE 'AsyncOperationBase'

object_name

index_name

in_row_

used_page

_count

lob_

used_page

_count

used_

page_count

AsyncOperationBase                     

cndx_PrimaryKey

_AsyncOperation         

44560

4384

48944

AsyncOperationBase                     

ndx_SystemManaged                      

964

0

964

AsyncOperationBase                     

ndx_Cover_Async

Operation               

505

0

505

AsyncOperationBase                     

ndx_StartedOn_Async

Operation           

382

0

382

AsyncOperationBase                     

ndx_RegardingObjectId

_AsyncOperation   

380

0

380

AsyncOperationBase                     

ndx_RequestId_Async

Operation           

288

0

288

As you can see the table size had a pretty large increase in size.  That makes me wonder how much the indexes associated with that table grew in relationship to the other indexes, so I’m going to query the index physical stats to see if any of these rise to the top.  Again I would use this query right after the run, before would also provide you with some interesting perspective as well.  This query also puts the results in a pre-created database of your liking it will give you the physical stats of all your indexes of your database (again make sure you run this against your organization database).

select object_name(p.object_id) as ‘table_name’, cast (i.name as char(40)) as 'index_name', p.*

into [500UserRTMStats]..[OrgIndexPhysicalAfterLoadTestOutlookDM]

from sys.dm_db_index_physical_stats(DB_ID(), null, null, null, 'LIMITED')p join sys.indexes i on

p.object_id = i.object_id and p.index_id = i.index_id

order by oname

Then I pick the top 20:

SELECT TOP (20) table_name, index_name, index_id, partition_number, index_depth, index_level, avg_fragmentation_in_percent, fragment_count, avg_fragment_size_in_pages, page_count

FROM         OrgIndexPhysicalAfterLoadTestOutlookDM

ORDER BY page_count DESC

Column name due to lack of space:

a = table_name

b = index_name

c = index_id

d = partition_number

e = index_depth

f = index_level

g = avg_fragmentation_in_percent

h = fragment_count

i = avg_fragment_size_in_pages

j = page_count

a

B

c

d

e

f

g

h

i

j

ActivityPointerBase

cndx_Activity

Pointer                   

1

1

4

0

34.4007...

184518

2.3550...

434546

QuoteBase

cndx_Primary

Key

_Quote                  

1

1

4

0

0.7415...

49167

7.6301...

375154

ActivityPartyBase

cndx_PrimaryKey

_ActivityParty          

1

1

4

0

96.7904...

223873

1.0291...

230405

AnnotationBase

cndx_Annotation                        

1

1

4

0

0.04444...

18889

7.9800...

150736

QuoteDetailBase

cndx_PrimaryKey

_QuoteDetail            

1

1

4

0

0.7280...

19057

7.6110...

145044

ActivityPartyBase

ndx_ActivityParty                      

3

1

4

0

1.7380...

11789

9.1848...

108280

QueueItemBase

cndx_PrimaryKey

_QueueItem              

1

1

4

0

19.2770...

27109

3.4086...

92405

OpportunityBase

cndx_PrimaryKey

_Opportunity            

1

1

4

0

21.7693...

28200

3.1697...

89387

ActivityPartyBase

ndx_for_cascade

relationship_Activity

Part

2

1

4

0

0.3507...

7724

9.8177...

75832

ActivityPointerBase

ndx_Sync_Activity

Pointer               

5

1

4

0

67.8189...

51847

1.4021...

72698

OpportunityProductBase

cndx_PrimaryKey

_OpportunityProduct     

1

1

4

0

21.7655...

22556

3.1700...

71503

ActivityPointerBase

ndx_Auditing                           

4

1

4

0

1.1862...

8946

7.8305...

70052

ActivityPartyBase

ndx_ActivityId                         

6

1

4

0

10.8198...

15310

4.54990...

69659

ActivityPartyBase

ndx_Cover                              

4

1

4

0

2.1572...

6874

8.9621...

61606

ActivityPointerBase

ndx_ActivityPointer

SubjectActivityId   

13

1

4

0

1.3178...

6995

8.5157...

59568

ActivityPointerBase

ndx_Security                           

8

1

4

0

0.8559...

6716

8.2634...

55497

DuplicateRecordBase

cndx_PrimaryKey

_DuplicateRecord        

1

1

3

0

99.2410...

49544

1

49544

ActivityPartyBase

ndx_Sync_Version

Number                 

5

1

3

0

10.2964...

10211

4.6520...

47502

AsyncOperationBase

cndx_PrimaryKey

_AsyncOperation         

1

1

3

0

40.4595...

20054

2.2069...

44259

DuplicateRecordBase

ndx_bulkdetect                         

3

1

4

0

88.1995...

39392

1.1203...

44134

Some numbers have been truncated due to the width of the table.

As you can see from this, the primary key index for the AsyncOperationBase table made the top 20 list of page counts which is pretty high we can also see that the index depth is 3 levels deep.  Our data is activity heavy so you can see quite a few activity pointer base and active party base indexes have large page counts and are deep as well. 

That segues into looking at how much usage these indexes saw over the course of this run.  This query will inject the index usage for a specified database (8 is my organization’s database id) into a pre-created database (like the scripts above).

select object_name(p.object_id) as ‘table_name’, cast (i.name as char(40)) as 'index_name', p.*

into [500UserRTMStats]..[OrgIndexUsageDBAfterLoadTestOutlookDM]

from sys.dm_db_index_usage_stats p join sys.indexes i on

p.object_id = i.object_id and p.index_id = i.index_id

where p.database_id = 8

Then I’ll look at the result set ordering by user_scans (I’m reducing the result set in the table below to relevant information only as well as only returning the top 20, due to display/readability issues)

SELECT     table_name, index_name, database_id, object_id, index_id, user_seeks, user_scans, user_lookups, user_updates, last_user_seek, last_user_scan, last_user_lookup, last_user_update, system_seeks, system_scans, system_lookups, system_updates, last_system_scan

FROM         OrgIndexUsageDBAfterLoadTestOutlookDM

ORDER BY user_scans DESC

a = table_name

b = index_name

c = object_id

d = index_id

e = user_seeks

f = user_scans

g = user_lookups

h = user_updates

i = system_seeks

j = system_scans

Note some of these fields don't show due to the length of the table.

a

B

c

d

e

f

g

h

i

j

WorkflowWaitSubscriptionBase

ndx_CascadeRelation

ship

_AsyncOperation_W

892582268

2

6465

48782

0

11426

0

14

WorkflowDependencyBase

cndx_PrimaryKey

_Workflow

Dependency     

1728061242

1

0

48304

0

0

0

0

Entity

XPKEntity                              

359008360

1

22277

34156

0

0

0

1

PrivilegeBase

cndx_PrimaryKey

_Privilege              

770101784

1

799

30044

0

0

0

1

PrivilegeObjectTypeCodes

ndx_Unique_

PrivilegeObjectType

CodePrivil

546817010

5

799

29987

0

0

0

0

SubscriptionClients

ndx_Cover                              

1174295243

3

0

22701

0

380

0

1

OrganizationBase

cndx_PrimaryKey

_Organization           

210099789

1

139178

21447

0

242

0

1

ActivityMimeAttachment

cndx_PrimaryKey

_Activity

MimeAttachment 

165575628

1

0

19909

0

0

0

0

ReportVisibilityBase

cndx_PrimaryKey

_ReportVisibility       

1344059874

1

0

17582

0

0

0

1

ReportEntityBase

cndx_PrimaryKey

_ReportEntity           

1788585460

1

0

17569

0

0

0

1

SystemUserBase

cndx_PrimaryKey

_SystemUser             

1954106002

1

590965

13340

10511

0

0

2

WorkflowWaitSubscriptionBase

cndx_PrimaryKey

_Workflow

WaitSubscription

892582268

1

4943

10087

102883

115007

0

18

WorkflowBase

ndx_SystemManaged                      

1004582667

6

0

7516

0

0

0

0

EquipmentBase

cndx_PrimaryKey

_Equipment              

1270295585

1

2620

5751

1144

0

0

2

SdkMessageProcessingStepBase

cndx_PrimaryKey

_Sdk

MessageProcessing

Step

1580584719

1

4929

5750

1613

0

0

0

DuplicateRecordBase

ndx_bulkdetect                         

2092586543

3

1181

3584

0

3176

0

204

SdkMessageFilterBase

cndx_PrimaryKey

_Sdk

MessageFilter       

1404584092

1

3421

3461

1

0

0

0

SalesOrderDetailBase

cndx_PrimaryKey

_Sales

OrderDetail       

1602104748

1

15

3427

15

0

0

2

QuoteDetailBase

cndx_PrimaryKey

_QuoteDetail            

1186103266

1

1139

3426

589

903

0

3

InvoiceDetailBase

cndx_PrimaryKey

_InvoiceDetail          

1493580359

1

15

3426

630

0

0

1

As you can see the majority of scans in this database are due to the workflow tables.  Let see if we can track down some offensive queries that were running during this run that may have caused those scans in the system.  We do this by running a query stats query, you can have the toolkit do this automatically after the run finishes by setting the querystats node in the ConfigSettings.xml to true, if you go this route you’ll notice a pause after the run finishes while it creates this table for you.  The table will be created in the master database and named QueryStats_[your load test name]_[the time the load test was started].  If you don’t want the toolkit to run the query stats for you, you can run the script below to achieve the same objective. Make sure to run it against the organization database and pointing it to a pre-created database as mentioned above.

select

      substring (qt.text, (qs.statement_start_offset/2) + 1, ((case qs.statement_end_offset when -1 then datalength(qt.text) else qs.statement_end_offset end - qs.statement_start_offset)/2) + 1) as statment_text

      , qt.text as batch_text

      , qt.dbid as 'query_text_dbid'

      , qpa.Attribute

      , qpa.value as 'plan_attribute_dbid'

      , qs.plan_generation_num

      , qs.total_worker_time / qs.execution_count as 'avg_worker_time'

      , qs.total_physical_reads / qs.execution_count as 'avg_physcial_reads'

      , qs.total_logical_reads / qs.execution_count as 'avg_logical_reads'

      , qs.total_logical_writes / qs.execution_count as 'avg_logical_writes'

      , qs.total_elapsed_time / qs.execution_count as 'avg_elapsed_time'

      , qs.creation_time

      , qs.last_execution_time

      , qs.execution_count

      , qs.total_worker_time

      , qs.last_worker_time

      , qs.min_worker_time

      , qs.max_worker_time

      , qs.total_physical_reads

      , qs.last_physical_reads

      , qs.min_physical_reads

      , qs.max_physical_reads

      , qs.total_logical_reads

      , qs.last_logical_reads

      , qs.min_logical_reads

      , qs.max_logical_reads

      , qs.total_logical_writes

      , qs.last_logical_writes

      , qs.min_logical_writes

      , qs.max_logical_writes

      , qs.total_elapsed_time

      , qs.last_elapsed_time

      , qs.min_elapsed_time

      , qs.max_elapsed_time

      , qp.query_plan

into [500UserRTMStats]..[PubQueryStatsAfterLoadTestOutlookDM]

from sys.dm_exec_query_stats qs cross apply sys.dm_exec_sql_text(qs.sql_handle) qt cross apply sys.dm_exec_query_plan(qs.plan_handle) qp cross apply sys.dm_exec_plan_attributes (qs.plan_handle) qpa

where qpa.Attribute = 'dbid'

This table will have a ton of really good data (I won’t be able to show it here other than a screenshot as there is just too much data to view).  I’ll get this by running the following script against the table created above (depending on your method of generation). 

SELECT     *

FROM         dbo.QueryStats_LoadTestOutlookDM_2_23_2008_1_10_07_AM

ORDER BY total_physical_reads DESC

clip_image003

Below are the top offending queries for physical reads.

select commitment0.ActivityId as 'activityid', commitment0.ActivityTypeCode as 'activitytypecode', commitment0.ServiceId as 'serviceid', commitment0.Effort as 'effort', commitment0.StatusCode as 'statuscode', commitment0.StateCode as 'statecode', commitment0.ScheduledStart as 'scheduledstart', commitment0.ScheduledEnd as 'scheduledend', commitment0.ResourceSpecId as 'resourcespecid', commitment0.PartyId as 'partyid', commitment0.PartyObjectTypeCode as 'partyobjecttypecode', commitment0.CommitmentId as 'commitmentid' from Commitment as commitment0 where ((commitment0.PartyId = @PartyId0 and commitment0.ScheduledStart < @ScheduledStart0 and commitment0.ScheduledEnd > @ScheduledEnd0))

select top 51 quote0.Name as 'name', quote0.StateCode as 'statecode', quote0.TotalAmount as 'totalamount', quote0.CustomerId as 'customerid', quote0.QuoteId as 'quoteid', quote0.TransactionCurrencyId as 'transactioncurrencyid', quote0.CustomerIdName as 'customeridname', quote0.CustomerIdType as 'customeridtype', quote0.CustomerIdDsc as 'customeriddsc', quote0.TransactionCurrencyIdName as 'transactioncurrencyidname', quote0.TransactionCurrencyIdDsc as 'transactioncurrencyiddsc', quotecustomeridcontactcontactid.EMailAddress1 as 'quotecustomeridcontactcontactid.emailaddress1' from Quote as quote0 left outer join Contact as quotecustomeridcontactcontactid on (quote0.CustomerId  =  quotecustomeridcontactcontactid.ContactId and ((quotecustomeridcontactcontactid.DeletionStateCode in (@DeletionStateCode1)))) where (quote0.OwnerId = @OwnerId0 and (quote0.DeletionStateCode in (@DeletionStateCode0)) and (quote0.StateCode = @StateCode0 or quote0.StateCode = @StateCode1)) order by quote0.Name asc, quote0.QuoteId asc

INSERT INTO DuplicateRecordBase(DuplicateId, BaseRecordId, DuplicateRecordId, DuplicateRuleId, AsyncOperationId, CreatedOn, DeletionStateCode, BaseRecordIdTypeCode, DuplicateRecordIdTypeCode) SELECT TOP 1000 NEWID(), A.ObjectId, B.ObjectId, '0c56882e-35a2-4898-b094-8575f1368160', 'a45ce699-f2e1-dc11-a906-0016357d705f', GETDATE(), 0 , 2, 2 FROM dbo.MatchCode43c5fd99976f45868dc38b6cd803fb85 As A, dbo.MatchCode43c5fd99976f45868dc38b6cd803fb85 As B WHERE A.MatchCode = B.MatchCode AND A.ObjectId <> B.ObjectId AND A.ObjectId IN (SELECT RecordId FROM BulkDetect_a45ce699f2e1dc11a9060016357d705f WHERE SN BETWEEN 2001 AND 3000) AND A.MatchCode IS NOT NULL AND NOT EXISTS (SELECT 1 FROM DuplicateRecordBase WHERE BaseRecordId = A.ObjectId AND DuplicateRecordId = B.ObjectId AND DuplicateRuleId = '0c56882e-35a2-4898-b094-8575f1368160' AND AsyncOperationId = 'a45ce699-f2e1-dc11-a906-0016357d705f' AND DeletionStateCode = 0 AND BaseRecordIdTypeCode = 2 AND DuplicateRecordIdTypeCode = 2)

From here I would take the offending queries and look at the execution plan which is gathered at the time of the query.  If there are any scans in the execution plan, I can play back the query to improve upon its performance by adding an index.

Summation

The Microsoft Dynamics CRM 4.0 Performance Toolkit is an excellent way to prepare a system for production to see if it will perform up to your needs.  Always start by defining what you need from your system (i.e. how much data will you have? what entities will we be using most? what features will we be using most?); this is the only way to find out if performance will meet your expectations.  Once you have that, the toolkit can be easily setup to create the data, and run the benchmark tests that will tell you the performance of the system you want as well as how to investigate any bottlenecks in the system.  I hope this blog gives you the starting pieces you need to investigate your system as you take advantage of all the great features in Microsoft Dynamic CRM 4.0.  I’ll be adding more to this blog on a monthly to bi-monthly basis as my investigations take me into different areas or as questions arise around these blogs. 

The toolkit is created as a project http://www.codeplex.com/crmperftoolkit in order to develop a community which can actively utilize the toolkit as well as contribute in enhancing the tool with further improvements down the road. If you are interested in contributing to the project, send a request to mailto:ssiadmin@microsoft.com with a request to become a contributor to the crmperftoolkit project. For help, issues or if you have any questions or comments about this blog please see the sections “Discussions” and “Issue Tracker” in the project web page http://www.codeplex.com/crmperftoolkit.

Russ Dobbins

Comments (4)

  1. A CRM Riff says:

    Having made the leap to locally available Verizon FiOS with speeds to the family computer of 15 ghz,

  2. P.Sivakumar says:

    The platform preperation guide recommends to keep SQL and CRM application server in LAN proximity. One of CRM partners is suggessting that CRM 4.0 supports application servers across WAN (ie single SQL server + multi CRM application servers in WAN)….

    I am confused. Any comments and/or suggesstions.

    Thanks

  3. Igor says:

    I would say the main problem with poor SQL code. For example from the last INSERT from this investigation:

    why this code should be executed for each row:

     AND A.ObjectId IN (SELECT RecordId

                        FROM BulkDetect_a45ce699f2e1dc11a9060016357d705f

                        WHERE SN BETWEEN 2001 AND 3000)

    It can be easily replaced by a simple JOIN.

    And creating indexes in this case will not help, it is a wrong way to fix a poor code with INDEX or hardware upgrade.

Skip to main content