One of the SharePoint product manager’s (I think it was Joel Oleson) mentioned at a conference: “If you create a bad database design, don’t blame SQL Server when it performs poorly. The same applies to SharePoint.”
Coming from a developer background, the importance of getting of database design right was drilled into me, hence why this quote clicked straight away for me. SharePoint is no different; if you don’t understand the containment hierarchy of SharePoint your environment may not scale or perform as well as you expect it to.
Defining the information architecture for your SharePoint environment and ultimately for your business is a fine art. Some would say a black art! J It marries your business information schema, designed typically by your information architect or intranet portal specialist, to SharePoint’s containment hierarchy and the site structure that under pins your portal infrastructure.
The sections below provide you an overview of important containment elements that make up your information architecture in SharePoint.
SharePoint Containment Hierarchy
The containment hierarchy within SharePoint will govern how you store information, your portal’s performance, extensibility and the general management and control you have over the growth of the usage of the SharePoint infrastructure.
The farm is the collection of servers, services, databases, and other hardware infrastructure that make up the SharePoint implementation.
Servers of your farm provide the physical infrastructure for your SharePoint web applications. Various servers perform various roles within your environment such as:
- Web Front End: These servers host traffic respond to request to pages from users of the portal. Typically 1 to 8 servers are used at this layer of the topology depending the load on the environment.
- Application Servers: Index Server, Query Server, Excel Calculation Services, Forms Services add specific services to the SharePoint environment.
- SQL Server: Typically SQL Server 2005 is used as the backend database for SharePoint. Almost everything in SharePoint is stored at the database level.
The next layer down is the Application Pool level. Application pools are hosted and managed by Internet Information Services on Windows Server.
IIS application pools are typically implemented to achieve process isolation between content. Application Pools provide a way for multiple sites to run on the same server computer but still have their own worker processes and identity. This mitigates an exploit on one site that provides an opportunity for an attacker to inject code onto the server to attack other sites.
A web application is an IIS Web site that is created and used by SharePoint Products and Technologies. Each Web application is represented by a different Web site in IIS. You assign each Web application a unique domain name, which helps to prevent cross site scripting attacks.
· Use sparingly (few is better) as they consume a lot of memory.
· Required Web Applications:
o Central Admin Web Application (This one can be disabled when not in use.)
o SSP Admin Web Application
o 1-N Content Web Applications
· The most common reasons for using more web applications are :
o Namespace (assuming you are using isolated app pool)
§ E.g. Division of data into separate memory footprint for department that needs to be ultra secure.
o Customization levels (assuming you are using isolated app pool)
§ E.g. 2 SSPs are being used to isolate search content or BDC data.
o Separate anonymous content from authenticated content.
o Isolate users
o Enforce permissions: Enable you to use to policies for web applications feature.
o Optimize performance: Grouping applications that have similar data characteristics result in better performance.
o Optimize manageability: Manage different site limits, expiration and recycle bin at the web application level. You can negotiate different SLA for each Web Application. (E.g. My Sites = low priority versus Critical Content Web App.)
· By default every web application is installed on every Web Front End web server
· Web applications can share Application Pools
· Certain features, such as site self creation, and presence, can only be configured at the web application level
· Avoid more than 99 Web Applications per farm. This includes the number of Web applications on child farms consuming resources on this SSP.
Why have we included databases in your information architecture? The reason is that each Site Collections in SharePoint can only exist in one content database and cannot be split over multiple content databases. This has a direct impact on the performance of your information architecture as your portal grows in size and usage. If you place all sites in one site collection, or all site collections in one content database, your content database that hosts the site collections will take more and more strain as your portal environment grows over time.
The core databases of SharePoint are the following:
· Configuration database
· Administration database
· Shared Service Provider database(s)
· Search database(s)
· 1 – many content databases
Therefore the ability to load balance your site collections over multiple content databases, and if necessary, over multiple SQL instances, is especially useful in scenarios where you expect increased growth in the size of the total content stored and the number of users hitting your portal.
This doesn’t mean that every site should be a site collection, but rather that, for large solutions, you don’t store everything in one site collection.
Content Database Sizing
The numbers discussed in the points below are based on information from TechNet – Plan for Software boundaries –http://technet2.microsoft.com/Office/en-us/library/6a13cd9f-4b44-40d6-85aa-c70a8e5c34fe1033.mspx?mfr=true and previous experiences.
· Establish target sizes for content databases with appropriate size warning thresholds.
· Associate site collections to specific content databases. This approach enables you to place one or more site collections in a dedicated database that can be managed independently from the rest.
· Avoid more than 50 000 Site Collections per content database. (According to Technet article.) I would
· Maximum Content Databases per Web application: 100
· Number Site Collections per Content Database – Recommendation à 250 to Max (5000): Although the capacity guidelines document says 50 000, it is recommended to limit the number of sites in a database so that the content database doesn’t grow to an unmanageable size. OOB settings = 9000 site warning notification and 15000 site threshold. (don’t forget to update this setting).
· The size of the each Site Collection can be controlled via policy at the web application level
· To estimate how many site collections per content database, use the following formula:
· Estimated database size / avg. Site Collection Size = No of Site Collections. E.g.
1. Database size (50 GIG) / Avg. Site Collection Size (500mb) = 100 Site Collections
2. Database size (250 GIG) / Avg. Site Collection Size (500mb) = 500 Site Collections
· Content Database Size per Database Instance– Recommendation à 50-100 GIG: It’s all about your SLA. If something goes down, how long can you be offline for? Backup and Restore take time in two ways:
· Backup: The larger the database the longer the backup.
· Restore: The larger the database the longer the restore process.
· Databases per SQL Instance
· On 32 bit systems main bottlenecks are memory (failed uploads and ultimately failed page loads would occur if the available MTL space became lower than page size. Limit the number of content databases hosted on a SQL instance based on average and peak utilization of each database.
· On 64 bit systems no limitations due to increased amount of RAM and processor horsepower available.
· Total Databases per SQL instance : Main Considerations at the database layer are the following
· SQL Connections available (Memory)
· Network Bandwidth available
· Contentions with your backup solution
· Disk I/O depending on type of disk in use. E.g. larger disks or SATA disks could be problematic.
A site collection, which is hosted in a web application, is a set of Web sites that has the same owner and shares administration settings. Each site collection contains a top-level Web site and can contain one or more subsites. A site collection usually has a shared navigation structure.
Site collections bridge logical architecture and information architecture. The design goals for site collections in the model are to satisfy requirements for URL design and to create logical divisions of content.
Site collection sites provide features to manage itself and the subsites contained beneath it.
• Content types do not span site collections
• Quota’s only apply at the site collection level
• Roll up ownership and quota’s to the site collection level
• While a site collection can support up to 250,000 subwebs (sites), it only support 2000 subwebs (sites) per tier (parent site). This is due to the fact that the interface for enumerating subsites of a given Web site does not perform well as the number of subsites surpasses 2,000.
• Best Bets are configured at the site collection level
• OOB Webparts typically rollup to the site collection level and do not span across site collections. E.g. Recent Documents Webpart.
• Master pages and site look and feel apply only within the site collection.
• Use Managed Paths to specify that only one or more than one site collection can exist for a specified path. (Managed Paths allow you to do perform two important tasks, namely, indicate which pieces of the URL namespace are controlled by Microsoft Windows SharePoint Services and specify paths to use for Self-Service Site Creation. e.g. /sites/)
• Avoid more than 50 000 Site Collections per Web Application.
• For more information on planning for software boundaries of SharePoint, see TechNet – Plan for Software boundaries- http://technet2.microsoft.com/Office/en-us/library/6a13cd9f-4b44-40d6-85aa-c70a8e5c34fe1033.mspx?mfr=true
The sites in a site collection are usually interrelated by purpose. To maximize your solution’s usability, store all related data and content within a single site collection. Benefits of doing this include:
- Content types and columns managed in a site collection can be shared across all sites in the site collection. Conversely, there is no automatic mechanism for propagating content types and column definitions across multiple site collections.
- Information management policies managed in the site collection can be made available to content in all sites in the site collection.
- Office SharePoint Server 2007 automatically updates links to renamed or moved files within a site collection to reflect their new names or locations. Conversely, links to documents in other site collections are not updated.
- If the site collection is on a server running Windows SharePoint Services 3.0, searching can only be done over the content in that site collection. If the site collection is on a server running Office SharePoint Server 2007, content can be searched across multiple site collections. As Reuters are planning on using Office SharePoint Server 2007, they will be able to use all of the search functionality SharePoint has to offer.
- Some views in Office SharePoint Server 2007 list documents from multiple sites within a single site collection (for example, a view enumerating all tasks assigned to a user across a site collection). Also, developers can create cross-site database queries within a site collection, but cross-site queries are not supported across multiple site collections.
- Content quotas and other quotas can only be managed at the site-collection level.
Keep the following limits in mind when planning how to allocate your content across one or more site collections:
- Creating too many subsites of any site in a site collection might affect performance and usability. Limit the number of subsites of any site to 2,000 at most. See TechNet – Plan for Software Boundaries – http://technet2.microsoft.com/Office/en-us/library/6a13cd9f-4b44-40d6-85aa-c70a8e5c34fe1033.mspx?mfr=true
- All sites in a site collection share the same back-end resources. In particular, all content in a site collection must be stored in the same content database. Because of this, the performance of database operations — such as backing up and restoring content — will depend on the amount of content across the entire site collection, the size of the database, the speed of the servers hosting the database, and other factors. Depending on the amount of content and the configuration of the database, you might need to segment a site collection into multiple site collections to meet service-level agreements for backing up and restoring, throughput, or other requirements. It is beyond the scope of this article to provide prescriptive guidance about managing the size and performance of databases. For more information about capacity planning, see “Plan for performance and capacity (Office SharePoint Server)” http://technet2.microsoft.com/Office/en-us/library/8dd52916-f77d-4444-b593-1f7d6f330e5f1033.mspx
- Particularly, keep extremely active sites in separate site collections. For example, a knowledge base site on the Internet that allows anonymous browsing could generate a lot of database activity. If other sites use the same database, their performance could be impacted. By putting the knowledge base site in a separate site collection with its own database, you can free up resources for other sites that no longer have to compete with it for database resources.
Various types of sites can be provisioned in SharePoint. Sites span from the highly structured, planned top level site down to the unstructured ad hoc collaboration team sites. Sites can contain sites underneath them, for example an HR site could contain 3 team sites. Sub sites do not contain all the features a Site Collection site contains. It is largely focused on looking after itself.
Examples of common out of the box site templates are:
• Wikis Site
• Blogs Site
• Team Site
• Blank Site
What is a list in SharePoint? The simple answer is almost everything in Office SharePoint Server 2007 is stored in a list. A more detailed answer is that Office SharePoint Server 2007 is built on top of a core set of collaboration services called Windows SharePoint Services 3.0 (WSS 3.0).
WSS 3.0 lists provides a core set of functionality including content management (Create, Read, Update, Delete), check-in / checkout, versioning, security, workflow, storage management, presentation management, the ability to perform advanced list customization and column configuration, along with many other features such as RSS or Office Integration depending on the type of list. To use a developer metaphor, think of a list as base functionality that can be implemented as derived in many ways in SharePoint. The storage layer for all lists in WSS 3.0 is SQL Server.
To expand on the security aspect, WSS supplies a role-based security model that maps groups of users to preconfigured sets of permissions that specify and limit the actions users can take on sites and site content. WSS 3.0 administrators can apply security settings from the Site Level to the List level all the way to the item level (an individual document in a library, for instance).
• The common lists are Document Libraries, Pages, Events, Images, Discussions, Surveys, etc…
• Lists per site: 2000 per web site: Performance degradation occurs after that.
• Number of documents per document library: 5 million. This is only available if you nest the documents in folders (maximum of 2000 per folder before list performance degrades.
• Warning: Consider the impact on the size and performance of the site collection. Just because it says you can store 5 million documents doesn’t mean you should!
• Size Impact: Average size of each file multiplied by the number of files you will need to store in a SINGLE content database. (The document library is hosted by a single site collection which in turn is hosted by a single content database).
• Performance Impact: Consider the impact on caching and sql query plan for the rest of your sites in your site collection. Splitting out authoring environments from reader environments improves the performance of the underlying databases due to an improved SQL Query Plan.
• Document File Size: 50MB (2GB Maximum): However note that library and file save performance degrades as larger files are used.
• Fields per list: 256 per list: Performance degradation occurs the greater the number field types are used.
• Columns per List: 2000 per document library / 4096 per list: Performance degradation occurs the greater the number columns are used.
• More data has to be retrieved from database.
• More data has to be presented to user.
• Heavy use of granular item level security degrades performance: Consider that SharePoint needs to retrieve each item level permission for a list to calculate what can be rendered in your view of the list. Therefore highly customized item level permissions will degrade the rendering performance of a list view.
An interesting whitepaper called “Working with Large Lists in Office SharePoint Server 2007” is available at: http://go.microsoft.com/fwlink/?LinkId=95450&clcid=0x409 . It provides guidelines for designing “performant” libraries.
See “Microsoft TechNet – Plan enterprise content storage” which discusses performance considerations for sites and lists: http://technet2.microsoft.com/Office/en-us/library/9994b57f-fef8-44e7-9bf9-ca620ce207341033.mspx?mfr=true
The performance of views degrades if the number of items displayed exceeds 2,000 items. A useful technique for limiting the number of items to display in a view is to index a column used in the view, and then to filter the view based on that column so that 2,000 or fewer items are displayed. (An indexed column is one that Office SharePoint Server 2007 maintains a record of to make view-related queries more efficient.)
For example, if it is unlikely that more than 2,000 items in a library will be modified in any seven-day period, you could index the Modified column in a library and then filter a view so that only items changed in the last seven days are displayed. (To do this, specify that the Modified column is less than Today-7.) As another example, if it is likely that each author will create less than 2,000 items, you could index the Created By column and then filter a view so that authors only see the documents they created. (To do this, specify that the Created By column is equal to Me.)
The following types of column types can be indexed and used to filter views:
Single line of text
Multiple lines of text
Date and Time
Person or Group
Here are other considerations in creating views filtered by indexed columns:
Only one indexed column can be used in a view.
Do not create filters using “Or” to provide multiple criteria when using an indexed column to filter a view.
Using the Item Limit feature to modify a view does not improve the view’s performance.
Items such as files, calendar items, contacts, customers, images and custom list items in SharePoint use lists as storage containers.