Some SharePoint Performance Recommendations

Article
08/30/2011

Introduction

First of all: no silver bullets here.

Really.

I mean, each application, and inside each application, each single feature, web part, service call, whatever, with their data, computation, input and output profiles, offers unique challenges regarding performance. Even if we make some of these moving factors fixed (i.e., assume a common platform, like SharePoint) there are still too many variables to provide a single-shot approach to performance.

But – there are some principles.

Don’t assume anything: many, many times I wanted to take the sweet shortcut of assumptions to jump into the code and fix that buggy, culprit-of-all-sins calls. What a mistake. Countless times, those assumptions were wrong or just a symptom of another root cause that was going to be fixed in the wrong place. Neither assume that a piece of code will be a bottleneck and start optimizing it before it proves to be so. Yes, you heard about it – no premature optimization. Don’t do it. That does NOT mean you shouldn’t be considering performance early in your project or feature. Yes, you should, but you should do it at the right level, architectural or design, not at the code level. Prototype, check that external dependencies you depend on meet the performance goals.

So, if you don’t assume, what’s left? Hard facts, cold, no-feelings data. Where do you get that from? Depends on your application. Mostly, performance counters, application profiling and, mostly, user complaints. Which brings us to the next principle.

Put everything in context. Is that an expensive call? It depends. If it’s run once a month by a transaction that has 24 hours to finish, of course is not. If it’s in the middle of a web request expected to yield a result page in 2 seconds? Of course it is. You see? What if that same call is being performed as part of the processing of that same page but the user really is expected to perceive that “some important work” is being done behind the scenes, as is the case with online booking sites? In that case, that expensive call may not be some much of a trouble.

What give you context, then? Goals. You have to have performance goals from the very beginning. They could translate to more concrete goals as you go, but you have to identify potential bottlenecks early in the project. And something becomes a bottleneck only if prevents a scenario from meeting its performance goals.

So you got goals and some principles. What do you need now? A process.

Delivering performance is about keeping this ball rolling:

while ( performance does not meet requirements )

{

Identify deltas vs. goals

Find root cause

Fix it

}

That’s it. Needless to say, each step in the process has its own details, but pretty much, this is it.

So, enough generalities. Time to go for specifics. SharePoint specifics.

SharePoint Specifics

Let’s start with some pieces of advice. They derive from more general concepts, but translate into specific actions in the SharePoint platform.

Avoid excessive database round-trips

SharePoint stores all user content in content databases. Which means, retrieval of user content through the object model will cause database round-trips (that is, from web front end to database server).

The number of round-trips being performed should be minimized. Easier said than done, how this advice falls into each application depends, more than anything else, on its data model.

In general, the more normalized, the more database round-trips that will be likely being performed. That is, if to create the response of a request you have to collect data from N different lists and/or databases (for example, list fields or user profile properties referencing values in taxonomies), you’ll be incurring at least 2*N round-trips (one to retrieve each list, one to query it). Add up the retrieval of lazy-loaded item properties in a loop, and you get the idea of the impact of minimizing the number of round-trips.

Another point related to data model concerns which data is being brought when a given item is retrieved. In general, data included in its content type is being retrieved with the item. Data in the item not included in its content type is not.

Given that there’s a limit to the degree of de-normalization you can achieve, what to do? Caching.

Consider Caching

For our purposes, let’s consider two ways of caching.

First one: caching an object to be used during the lifetime of a single request among different components. In this scenario, you can cache SharePoint SP* objects.

Second one: caching an object to be used across requests. Simply put, SharePoint SP* objects are not cacheable across requests. Reason is, each object points to the DB COM request object that brought its data, and bad things happen if you attempt to re-use it across different requests. You can use SharePoint Publishing caching infrastructure. But its downside is that, the first request to bring the object is relatively expensive. Depending on how probable is that same item will be requested while in cache, it may be beneficial to pay that cost. That includes the need for using the Publishing API’s for caching.

One sensible suggestion is to just implement a custom cache structure. Use SharePoint objects as storage, but cache your entities independently from them.

Minimize cost of each round-trip

Along with minimizing the number of round-trips, it should be a goal to minimize the cost of each of them.

Some actions to consider:

limit the number of rows to be returned. For example, set SPQuery.RowLimit.
limit the number of fields to be returned. For example, set SPQuery.ViewFields

Never increase default query thresholds in SharePoint. They are there for a reason. A good reason. If a query is returning more than, say, 5000 items, that may be a sign of a poor filtering in the SPQuery. And if you really need to retrieve more than 5000 items, for ex. from a job definition to perform some bulk action, use the SPListItemCollectionPosition’s functionality.

One last point – simply avoid complex queries. Multiple joins, beyond the outer-joins that result from SharePoint translation from CAML-to-SQL, hurt performance. If you really need to perform those queries, for example in a data-warehousing scenario, you may do better by designing an ETL (Extract-Transform-Load) layer that extracts data from SharePoint, puts it in SQL and you query SQL.

List indexing

Another point related to the cost of each round-trip is SharePoint indexing. In order for SharePoint to translate an SPQuery into a performing SQL query, proper indexes in lists should be added. As it happens with plain SQL, indexes depend on the access patterns of the application. That is, indexes should be added including fields on which most predicating (i.e., filtering) happen.

Proper database maintenance

At the end, SharePoint depends on how well SQL Server performs. Overtime, its performance depends on proper maintenance.

There are tons of reference on SharePoint SQL maintenance, for all its specific databases (config, content, search). Strongly consider

re-indexing
defragmentation
running statistics update with full-scan
specifying max server memory: SQL could be eating up memory, leaving OS with no memory
turning off auto_create_statistics and auto_update_statistics
disabling auto-grow and auto-shrink
avoiding tempdb contention; create more than one datafile in tempdb

Truth be told, default SharePoint jobs for database maintenance are not enough for scenarios with a high number of data turnover (read/writes).

References

Here are some interesting references to look into:

Best Practices: Common Coding Issues When Using the SharePoint Object Model

Best Practices: Using Disposable Windows SharePoint Services Objects

Code Review: .NET Application Performance

Database maintenance for Office SharePoint Server 2007 (white paper)

Designing for Performance

Designing large lists and maximizing list performance (SharePoint Server 2010)

Find Application Bottlenecks with Visual Studio Profiler

Improving SharePoint Performance

SharePoint Dispose Checker Tool

White paper: Working with large lists in Office SharePoint Server 2007

Appendix 1: Quick profiling tutorial

If you want to profile your code pretty quickly, just do the following:

install Visual Studio 2010 Performance Tools with Service Pack 1
once in the machine where you are going to profile w3wp, run vsperfclrenv /globalsampleon
run your scenario, w3wp process comes up
attach profiler to w3wp
1. SET PERFTOOLSDIR="c:\Program Files (x86)\Microsoft Visual Studio 10.0\Team Tools\Performance Tools\x64"
2. %PERFTOOLSDIR%\vsperfcmd /crosssession /start:sample /attach:<w3wp’s PID> /output:c:\profile.vsp
wait until you consider enough data was collected
detach profiler from w3wp, allowing it to dump profile results into c:\profile.vsp
1. %PERFTOOLSDIR%\vsperfcmd /detach
shutdown perf monitor
1. %PERFTOOLSDIR%\vsperfcmd /shutdown

You should end up with a profiling that you can open in Visual Studio for analysis.

Appendix 2: Performance script

You can automate steps 4 to 7 above using the following PowerShell script.

#
# VSProfile.ps1
# Sample calls:
# .\VSProfile.ps1 -AppPoolName "SharePoint - 80" -ProfileDurationSeconds 120
# .\VSProfile.ps1 -AppPoolName "SharePoint - 80" -ProfileDurationSeconds 120 -ProfileOutputPath C:\Profile.vsp
#
param($AppPoolName, $ProfileOutputPath, $ProfileDurationSeconds)

# set perf tools dir
$env:perftoolsdir = "$env:ProgramFiles (x86)\Microsoft Visual Studio 10.0\Team Tools\Performance Tools\x64"

# enable sampling
& "$env:perftoolsdir\vsperfclrenv" /sampleon

# get PID of the w3wp process to profile
& "$env:windir\System32\inetsrv\appcmd.exe" list wp | %{ if($_ -match "WP `"(?<pid>\d+).*(applicationPool:$AppPoolName)") { $w3wpid = [System.Int32]::Parse($matches["pid"]) } }
write-host "Will profile process ID $w3wpid"

# format output path
if([String]::IsNullOrEmpty($ProfileOutputPath))
{
$ProfileOutputPath = $env:SystemDrive + '\' + [DateTime]::Now.ToString("yyyyMMddhhmmss") + ".vsp"
}

# attach to that process
& "$env:perftoolsdir\vsperfcmd" /crosssession /start:sample /attach:$w3wpid /output:$ProfileOutputPath

# wait
[System.Threading.Thread]::Sleep([System.Int32]::Parse($ProfileDurationSeconds)*1000)

# detach
& "$env:perftoolsdir\vsperfcmd" /detach:$w3wpid
& "$env:perftoolsdir\vsperfcmd" /shutdown:5

# wait for vsp file to get totally flushed
[System.Threading.Thread]::Sleep(20000)