Link Integrity Management: Processes, Tactics, and Solutions
One lengthy and complex topic I recently discussed with a customer is link integrity management – how to keep links from breaking when you move and rename things. Turns out there are several dimensions to this problem and a wide array of tactics for addressing the many scenarios that can lead to broken links. I did a bit of research around the web and found quite a few processes, tactics, and a couple of technical solutions to mitigate these issues. We’re using a combination of all of them for this intranet publishing & collaboration deployment, so I thought I’d share the generic recommendations we came up with. Here goes…
The solutions for addressing link integrity issues typically include one or more of the following types of components:
· Training to prevent and reduce the occurrence of link integrity issues
· Proactive business processes to prevent link integrity issues
· Reactive business processes to correct link integrity issues
· Tools to identify link integrity issues reactively or proactively
· Tools to avoid or mitigate the impact of link integrity issues
When you’re comparing these options, you need to examine two aspects of their effectiveness: link sources and link destinations.
Link sources are the container for a link, and I decomposed these into the following four types:
· Intra-Site Collection
· External Web Sites
The link source matters because certain solutions rely on detecting a broken link and fixing it at the source – i.e. changing http://old to http://new – and certain solutions can only affect specific sources.
Link destinations are the endpoint of the link – I decomposed these into the following eight groups of SharePoint objects:
· Site Collections
· Libraries & Lists
· Content Pages (AKA Publishing Pages)
· Web Part & Basic Pages
Of course, effectiveness is the “benefits” side of the equation, so you also need to factor in the “costs” side as well, which range from free to hugely expensive (which isn’t always obvious). Now onto the solutions and my thoughts on each…
Site Design Practices
Site design practices should be part of standard SharePoint training for content authors and site owners. Renaming and moving content can often be avoided by using standard SharePoint navigation controls and features for organizing and viewing content. When moving and renaming content is required, a number of built-in tools for updating links, implementing redirects and analyzing the impact of these changes can mitigate the scope of link integrity issues.
Example practices for avoiding/mitigating link integrity issues include:
· Avoid renaming documents & files. Use the Title field – don’t rename the file.
· Avoid restructuring libraries & lists
o Avoid folders or keep folder structures SIMPLE so that they are less likely to change
o Use views and metadata to organize contents instead
o Don’t use document libraries simply to categorize documents – different libraries should be created to support different metadata, permissions, policies, workflows etc.
· Always implement a Redirect Page when moving Content Pages. The redirect page is a standard page layout that content authors can use to quickly implement a redirect.
· Always implement a Content Editor Web Part-based redirect when moving Web Part Pages or Basic Pages. Savvy users can insert an HTML snippet in Content Editor Web Parts to implement a redirect to another page.
· Always have a communications plan for your site – notify users well in advance of any major restructuring
· Use the site usage reports PROACTIVELY to notify users who may be impacted by site restructuring
· Review Redirect Page traffic in usage reports to determine if they are still needed
· Review 404 reports to identify users and sites with broken links. This requires a 404 reporting solution as described later.
· Review 404 reports to identify needed redirect pages. This requires a 404 reporting solution as described later.
Site Design Practices mitigate broken link issues across all link sources: intra site collection, intra-SharePoint, other web sites, and bookmarks/files. They don’t eliminate issues because they are largely dependent on people and processes, but the adherance to those processes is what counts.
Site Design Practices are effective for all major link destinations: Site Collections, Sites, Libraries & Lists, Folders, Content Pages, Web Part & Basic Pages, Files, and Items
Good site design practices reduce the occurrence of link integrity issues on all scopes and components. Good site management practices – in the form of communication plans and usage report analysis – reduce the impact of all link integrity issues whenever they occur. These best practices have negligible additional cost in relation to generally recommended training deliveries for site owners and content authors.
For more information
A broken link crawler is a tool that crawls one or more web sites, checks for 404 errors on all accessible pages in the site, and produces a report of the source and destination URLs for those 404 errors. They also typically produce a list of all links crawled and have mechanisms for seeking particular URL patterns.
Popular broken link crawler tools include:
· Xenu Link Sleuth
Some of these tools can be used proactively to crawl content for a base URL in preparation for a move. These reports enable the content owners with links to content that is going to be moved/renamed to update their links in advance or otherwise time the update of their links to coincide with the move of the external site.
All of these tools can be used reactively to identify broken links in these sites. These reports enable the content owners with broken links to identify content in need of updating.
Broken link crawlers can eliminate broken link issues across three link sources: intra site collection, intra-SharePoint, and KNOWN web sites. They are ineffective for dealing with links in user bookmarks/files, and they may encounter problems parsing some file formats and working within some security frameworks.
Broken link crawlers are effective for all major link destinations: Site Collections, Sites, Libraries & Lists, Folders, Content Pages, Web Part & Basic Pages, Files, and Items
Proactive use of these broken link crawlers is appropriate in advance of a major site collection or site restructuring, or major structural change to a very large document library. Site owners must be trained in advance to plan to request these reports. These reports are computationally expensive to produce since they add traffic to all systems examined by the tool, so they are not for casual use.
Reactive use of broken link crawlers should be conducted on a regular basis – i.e. weekly, monthly, quarterly reports that are submitted to all site owners of the source content site. This empowers site owners to better manage their site content.
Broken link crawlers cannot affect change for unreported sites and systems, nor can they scan user bookmarks – they can only produce reports for web sites that are accessible to the crawler.
Some of these tools are free, but the resources required to use them and integrate them into administration processes are not. Custom solutions are typically required to direct these reports to the appropriate site owners. They offer the most benefits for large migrations or restructuring efforts for sites that primarily render HTML content.
“404 reports” can be obtained directly from the IIS logs of MOSS servers. These logs show what broken links are receiving traffic and also importantly, who is attempting to access these URLs. Reports created from this data can easily identify the most frequently accessed 404 pages.
404 reporting & fixing can mitigate broken link issues across all link sources: intra site collection, intra-SharePoint, web sites, and user bookmarks/files.
404 reporting & fixing is effective for all major link destinations: Site Collections, Sites, Libraries & Lists, Folders, Content Pages, Web Part & Basic Pages, Files, and Items
Using 404 reports to address broken links is purely reactive measure, but it works for all logical architecture components. For existing sites, site owners can review the 404 reports and contact the user(s) who hit the broken links or, in the case of pages, implement a redirect. For sites that have moved, farm administrators can stake the same actions, since there would be no clear owner to a site that no longer exists at a given URL.
404 reports are not readily available to site owners, and are not easily compiled by farm administrators (especially in environments with more than one web front end server). Delivering 404 reports to both sets of users would require a custom solution which could also leverage a tool like LogParser or WebTrends to compile the reports.
“Smart” 404 Page
The standard 404 page simply presents a “404-page not found” message to the user. A “smart” 404 page would improve the user experience by providing richer features for finding the content the user is seeking, although not address the root cause of the missing object.
A Smart 404 Page can mitigate broken link issues across all link sources: intra site collection, intra-SharePoint, web sites, and user bookmarks/files.
A Smart 404 Page is effective for all major link destinations except Site Collections: Sites, Libraries & Lists, Folders, Content Pages, Web Part & Basic Pages, Files, and Items. Attempts to access a moved/missing site collection root will result in an ordinary 404 page.
A smart 404 page would, at a minimum, include SharePoint search results based on the terms in the URL that produced the 404 error. It could also include basic information, such as whether the site, list, or other container object currently exists or has been moved; or contact information for an existing site. Further, a smart 404 page could be used to trigger events for reactive measures, like notifying site owners or farm administrators.
A smart 404 page is a relatively simple customization to implement, as it could be based on an existing Codeplex project (http://www.codeplex.com/sharepointsmart404 ). Each component added to the Smart 404 page beyond the search results may require additional work.
Content Page Redirects
The Redirect Page is a standard page layout included in the MOSS Publishing Features that content authors can use to quickly implement a redirect. Authors use it to capture the new URL for a content page, whereupon the redirect mechanism on the page causes the user’s request to be redirected to the new URL if they hit the Redirect Page.
Though they can be implemented manually, one customization that could simplify the task of moving a single content page would be offering a new custom action that collapses the tasks of moving a content page and constructing a Redirect Page into a single step.
Content Page Redirects can eliminate broken link issues across all link sources: intra site collection, intra-SharePoint, web sites, and user bookmarks/files.
Content Page Redirects are only effective for Content Page objects.
Content Page Redirects can always be created manually. A “move and create redirect page” custom action would be an alternative available to content authors in addition to the standard tools they could use for moving content pages in SharePoint, such as Explorer View and SharePoint Designer. This action would provide link integrity across any scope for any content page moved in this manner.
Generation of redirect pages for content page moves should NOT be automated, as that could generate massive numbers of redirect pages, thereby slowing down requests, increasing storage requirements, and greatly increasing content complexity. Nearly all redirects are intended to be used for a limited period of time until communications have caught up with restructuring actions, so automated generation should be optional, not automatic.
Content authors on publishing sites must be trained in the use of Redirect Pages. Developing a custom action to simplify the process of cross site-collection moves of content pages would require a relatively simple customization – adding a custom Page Editing Menu item and a custom action associated with it.
Web Part Page Redirects
Web part pages and basic pages don’t have an equivalent to the Redirect page layout, but users can insert an HTML snippet in Content Editor Web Parts on web part pages to implement a redirect to another page. Content authors of content pages that contain web part zones can use the same approach for implementing a redirect.
Implementing this type of redirect requires some HTML knowledge, though a relatively simple custom “Redirect” web part could simplify this task and make it readily available for all users.
Web Part Page Redirects can eliminate broken link issues across all link sources: intra site collection, intra-SharePoint, web sites, and user bookmarks/files.
Web Part Page redirects are only effective for Content Pages, Web Part Pages, and Basic Pages.
These types of redirects work for all scopes for all content pages, web part pages, and basic pages to which they are applied. A custom redirect web part would include, at a minimum, a redirect message, a redirect URL, and a time period for pausing before the redirect action occurs. These parameters would be simple configuration settings for the web part.
Implementing a custom web part with these specifications would be a simple customization that would be considerably more cost-effective than supporting training issues relating to authoring HTML redirects or correcting issues with HTML redirects.
Site Collection/Site Redirects – IIS-based
An IIS-based redirect is an IIS virtual directory that uses a pattern match of arbitrary complexity to redirect URLs matching that pattern to other URL– for example, redirecting all traffic to http://sharepoint.ing.com/sites/oldsite to http://sharepoint.ing.com/sites/newsite for all URLs in oldsite.
IIS-based redirects initially eliminate broken link issues across all link sources: intra site collection, intra-SharePoint, web sites, and user bookmarks/files, but their quality may degrade over time.
IIS-based redirects areeffective for all major link destinations within their associated container (a site collection or site): Site Collections, Sites, Libraries & Lists, Folders, Content Pages, Web Part & Basic Pages, Files, and Items.
IIS-based redirects work for all scopes and object types for the base URL used in the redirect, whether that is a site collection, site, or lower logical architecture component. This is an incomplete solution – if any of the contents in the redirect destination are moved or been renamed after the redirect is imposed, those contents may result in broken links.
Since IIS-based redirects require creating a new virtual directories on each MOSS server to implement the redirect, this approach does not scale to large numbers of redirects. It should only be implemented for critical sites or for very limited periods of time for other sites, and should only be used at the site or site collection level.
This approach requires no customizations, but does require training and operations documentation for server administrators.
For more information
Hope these help… happy content authoring!