Cleaning Up Messy HTML

Updating your site with standards-compliant HTML

>Pad Gallagher's faceFrontPage helped bring WYSIWYG publishing to the World Wide Web. Since those early beginnings, however, the web has changed; countless new technologies have emerged, evolved, and slipped into obscurity. The World Wide Web Consortium (W3C) has led the way in defining the web’s technological common ground, referred to as web standards. Websites that conform to W3C web standards are referred to as standards-compliant.

Over the next several articles, I’ll be helping you update your website; migrating away from legacy code and components to newer solutions, based on web standards.

In this article, we’ll start the migration process by cleaning up the code so that it’s more in alignment with web standards and start moving style and formatting out of the web pages and into a cascading style sheet (CSS).

Before we begin, I strongly encourage you to make a backup copy of your website.

DOCTYPE

Browsers read the DOCTYPE declaration on a page to determine which version of HTML (or XHTML) the page uses. Expression Web uses the DOCTYPE declaration for validation and IntelliSense code completion. Adding a DOCTYPE declaration to your website’s pages makes it easier to create and maintain valid code throughout your project.

If your project contains no FrontPage web components, set the DOCTYPE to XHTML Transitional. If your project does contain FrontPage web components, you’ll experience fewer problems if you set the DOCTYPE to HTML 4.01 Transitional.

To set the DOCTYPE for a single page, from the Expression Web Code view:

  1. Press CTRL+HOME – the cursor moves to the head of the page (Line 1, Column 1)
  2. Press CTRL+ENTER – the DOCTYPE menu displays.
  3. Select the appropriate DOCTYPE from the drop-down list.

Repeat this procedure to set the DOCTYPE for every page in your project.

Now that the existing pages all have DOCTYPE declarations, this would be an excellent time to set the DOCTYPE for the entire project so that any new documents you add to the site are created with the correct DOCTYPE declaration. Click Tools, and then click Page Editor Options. Select the Authoring tab and set the Document Type Declaration.

Code cleanup

Once you've set the DOCTYPE declaration on a page and while you're still in Code view, right-click on the page and select Reformat HTML. This will tidy up the code formatting and attempt to validate the page against the declared DOCTYPE. If there are any code errors, Expression Web will highlight them in yellow. DOCTYPE validation errors are displayed with a squiggly red underline.

Go through the code and clear up what you can. When you hover your mouse cursor over an error, a tooltip displays with details about the error to help you find a solution.

Moving the presentation out of the HTML page

Create an external CSS

External style sheets are separate files from content pages that will be linked from each of your project’s web pages. This allows you to create a single style definition that controls the appearance of every instance of a style throughout your website.

Overall, cascading style sheets provide greater flexibility, easier maintenance, and true separation of content from presentation.

Each page must include a link to your CSS within the <head> section of the page. You can add this link manually in Code view: <link rel="stylesheet" type="text/css" href="mycss.css">; or you can add it from the Format menu by clicking CSS Styles, and then clicking Attach CSS Style Sheet in Design view.

Create CSS style definitions

Here’s a quick refresher for CSS style definitions. A CSS style definition has three parts:

  • A selector – what HTML element this style will apply to.
  • One or more properties – what aspect of the element's appearance will be styled.
  • A value for each property – how the property will style the element.

The properties and values are contained within curly braces, and properties are separated by semi-colons.

selector {

     property1: value;

     property2: value;

}

A selector can be:

  • An HTML element (p, h1, ol, etc.), which defines properties for that tag everywhere it appears.

  • A class, which defines properties for only those elements to which you assign that class name.

    Class selectors in a CSS begin with a period (for example, .codeview, .caption, and .uitext). Classes can be used with several different elements on a page. For example, the .uitext class can be applied to single or multiple words, entire paragraphs, headings, or specific sections on a page.

  • An ID, which defines properties for a single, unique element in a web page.

ID selectors in a CSS begin with a pound sign (#). IDs should be unique, and should be applied only once on a web page (this is especially important if you use JavaScript in your project). For example, you might choose to create an ID selector called #header and apply it to a DIV (<div id=”header”>). Remember, the #header ID—or any other ID—should only be used once on a page (though you can use the same ID on other pages). There is, of course, quite a bit more to cascading style sheets than this quick overview. See Expression Web Help for more information. Our objective right now is to create one style definition for each unique style in the website. For example, if every page has a heading in maroon 18pt bold, italic, Tahoma, we would create a style definition in the CSS that looks something like this:

h1 {

      color: #7F0000;

      font-size: 18pt;

      font-weight: bold;

      font-style: italic;

      font-family: Tahoma;

}

Apply CSS styles in Design view

Once you’ve created the CSS, you can quickly apply styles in Design view using the Apply Styles task pane. (If the Apply Styles task pane isn’t visible, click Task Panes, and then click Apply Styles.) The Apply Styles task pane shows all the available styles in a document. To apply the formatting to different areas of the page, select the text you want to format and then click a style in the Apply Styles task pane.

In Code view, you’ll see that CSS class and ID attributes have been added to the HTML tags for many formatted sections of text. If you prefer to code by hand in Code view, note that the “.” and “#” characters are omitted from the HTML attributes.

<p class="uitext">…</p>

<div id="header">…</div>

Scrub any leftover inline formatting

Once you’ve applied CSS styles to each content element in your site, it’s a good idea to go through each of the pages in Code view, clearing out any leftover presentation formatting, such as <font> tags (especially those with attributes, such as <font face=”Tahoma”> or <font color=”#007F00”>).

A special note about MsoNormal.

Copying and pasting content directly from Microsoft Word into FrontPage or Expression Web generates Office-specific markup, characteristically set off with an MsoNormal attribute. That markup may be valid (depending on the DOCTYPE declaration) but it adds unnecessary clutter and maintenance headaches.

Tip: Copy the affected text directly from Word into Notepad and then paste it into the web page. This will strip all inline, Office-specific formatting out of the content, to which you can then apply CSS styles.

Semantic content markup

Now that we’ve shifted the appearance of page elements out of the HTML page and into the CSS, the tags have semantic value—they actually describe what’s contained within them. Semantic content markup offers tremendous benefits in accessibility alone. For example, because the formatting and appearance of a standards-compliant web page are handled by the CSS, instead of by inline formatting, visually impaired users no longer have to sit through the screen reader’s tedious and time-consuming itemization of every table element and transparent GIF used in the non-compliant page’s layout tables.

Another great benefit to semantic content markup is that it can significantly improve search engine optimization. Standards-compliant web pages contain a bare minimum of markup, so when a search engine crawls your site, it encounters a much higher content-to-markup ratio, resulting in more accurate search engine results and higher page ranking.

One final note about semantic markup: <i> and <b> tags are considered display tags; they change only the appearance of the enclosed content. Use style definitions in your CSS for elements such as code examples or UI text that should be displayed in italics or bold.

In cases where your intent is to change the meaning of an element, use the standards-compliant <em> (meaning “emphasis”) and <strong> tags instead. Though these tags also correctly apply the desired formatting, their purpose is to provide semantic value, which—among other things—helps screen readers and other accessibility tools correctly interpret and present content enclosed in these tags.

Conclusion

DOCTYPE declarations and the separation of content from presentation are fundamental aspects of a standards-compliant website. More importantly, making these changes provides several important benefits: your website is much easier to maintain and extend in a standards-compliant editor such as Expression Web; you can add and expand accessibility features; and your site’s search engines results should improve.

In my next article, we’ll examine FrontPage web components and ways to replace them with standards-compliant alternatives.

Until next time!

Pad Galalgher, Technical Writer