Google has a great document that talks in extreme clear terms what site owners should do to enable better indexing of their content by majore search engines. This document can be found at http://googlewebmastercentral.blogspot.com/2008/11/googles-seo-starter-guide.html. I strongly recommend every site owner to read this guide.
Here are some important highlights taken from the above document.
Create unique, accurate page titles
• Accurately describe the page’s content – Choose a title that effectively communicates the topic of the page’s content.
• choosing a title that has no relation to the content on the page
• using default or vague titles like “Untitled” or “New Page 1”
• Create unique title tags for each page – Each of your pages should ideally have a unique title tag, which helps Google know how the page is distinct from the others on your site.
• using a single title tag across all of your site’s pages or a large group of pages
• Use brief, but descriptive titles – Titles can be both short and informative. If the title is too long, Google will show only a portion of it in the search result.
• using extremely lengthy titles that are unhelpful to users
• stuffing unneeded keywords in your title tags
Make use of the “description” meta tag
• Accurately summarize the page’s content – Write a description that would both inform and interest users if they saw your description meta tag as a snippet in a search result.
• writing a description meta tag that has no relation to the content on the page
• using generic descriptions like “This is a webpage” or “Page about baseball cards”
• filling the description with only keywords
• copy and pasting the entire content of the document into the description meta tag
• Use unique descriptions for each page – Having a different description meta tag for each page helps both users and Google, especially in searches where users may bring up multiple pages on your domain e.g. searches using the site: operator). If your site has thousands or even millions of pages, hand-crafting description meta tags probably isn’t feasible. In this case, you could automatically generate description meta tags based on each page’s content.
• using a single description meta tag across all of your site’s pages or a large group of pages
Improve the structure of your URLs
• Use words in URLs – URLs with words that are relevant to your site’s content and structure are friendlier for visitors navigating your site. Visitors remember them better and might be more willing to link to them.
• using lengthy URLs with unnecessary parameters and session IDs
• choosing generic page names like “page1.html”
• using excessive keywords like “baseball-cards-baseball-cards-baseballcards.htm”
• Create a simple directory structure – Use a directory structure that organizes your content well and is easy for visitors to know where they’re at on your site. Try using your directory structure to indicate the type of content found at that URL.
• having deep nesting of subdirectories like “…/dir1/dir2/dir3/dir4/dir5/dir6/page.html”
• using directory names that have no relation to the content in them
• Provide one version of a URL to reach a document – To prevent users from linking to one version of a URL and others linking to a different version (this could split the reputation of that content between the URLs), focus on using and referring to one URL in the structure and internal linking of your pages. If you do find that people are accessing the same content through multiple URLs, setting up a 301 redirect from non-preferred URLs to the dominant URL is a good solution for this.
• having pages from subdomains and the root directory (e.g. “domain.com/page.htm” and “sub.domain.com/page.htm”) access the same content
• mixing www. and non-www. versions of URLs in your internal linking structure
• using odd capitalization of URLs (many users expect lower-case URLs and remember them better)
Make your site easier to navigate
• Create a naturally flowing hierarchy – Make it as easy as possible for users to go from general content to the more specific content they want on your site. Add navigation pages when it makes sense and effectively work these into your internal link structure.
• creating complex webs of navigation links, e.g. linking every page on your site to every other page
• going overboard with slicing and dicing your content (it takes twenty clicks to get to deep content)
• having a navigation based entirely on drop-down menus, images, or animations (many, but not all, search engines can discover such links on a site, but if a user can reach all pages on a site via normal text links, this will improve the accessibility of your site; more on how Google deals with non-text files)
• Use “breadcrumb” navigation – A breadcrumb is a row of internal links at the top or bottom of the page that allows visitors to quickly navigate back to a previous section or the root page. Many breadcrumbs have the most general page (usually the root page) as the first, left-most link and list the more specific sections out to the right.
• Put an HTML sitemap page on your site, and use an XML Sitemap file – A simple sitemap page with links to all of the pages or the most important pages (if you have hundreds or thousands) on your site can be useful. Creating an XML Sitemap file for your site helps ensure that search engines discover the pages on your site.
• letting your HTML sitemap page become out of date with broken links
• creating an HTML sitemap that simply lists pages without organizing them, for example by subject
• Consider what happens when a user removes part of your URL – Some users might navigate your site in odd ways, and you should anticipate this. For example, instead of using the breadcrumb links on the page, a user might drop off a part of the URL in the hopes of finding more general content. He or she might be visiting http://www.brandonsbaseballcards.com/news/2008/upcoming-baseball-card-shows.htm, but then enter http://www.brandonsbaseballcards.com/news/2008/ into the browser’s address bar, believing that this will show all news from 2008. Is your site prepared to show content in
this situation or will it give the user a 404 (“page not found” error)? What about moving up a directory level to http://www.brandonsbaseballcards.com/news/?
• Have a useful 404 page – Users will occasionally come to a page that doesn’t exist on your site, either by following a broken link or typing in the wrong URL. Having a custom 404 page that kindly guides users back to a working page on your site can greatly improve a user’s experience. Your 404 page should probably have a link back to your root page and could also provide links to popular or related content on your site. Google provides a 404 widget that you can embed in your 404 page to automatically populate it with many useful features. You can also use Google Webmaster Tools to find the sources of URLs causing “not found” errors.
• allowing your 404 pages to be indexed in search engines (make sure that your webserver is configured to give a 404 HTTP status code when non-existent pages are requested)
• providing only a vague message like “Not found”, “404”, or no 404 page at all
• using a design for your 404 pages that isn’t consistent with the rest of your site
Use heading tags appropriately
• Imagine you’re writing an outline – Similar to writing an outline for a large paper, put some thought into what the main points and sub-points of the content on the page will be and decide where to use heading tags appropriately.
• placing text in heading tags that wouldn’t be helpful in defining the structure of the page
• using heading tags where other tags like <em> and <strong> may be more appropriate
• erratically moving from one heading tag size to another
• Use headings sparingly across the page – Use heading tags where it makes sense. Too many heading tags on a page can make it hard for users to scan the content and determine where one topic ends and another begins.
• excessively using heading tags throughout the page
• putting all of the page’s text into a heading tag
• using heading tags only for styling text and not presenting structure
Optimize your use of images
• Use brief, but descriptive filenames and alt text – Like many of the other parts of the page targeted for optimization, filenames and alt text (for ASCII languages) are best when they’re short, but descriptive.
• using generic filenames like “image1.jpg”, “pic.gif”, “1.jpg” when possible (some sites with thousands of images might consider automating the naming of images)
• writing extremely lengthy filenames
• stuffing keywords into alt text or copying and pasting entire sentences
• Supply alt text when using images as links – If you do decide to use an image as a link, filling out its alt text helps Google understand more about the page you’re linking to. Imagine that you’re writing anchor text for a text link.
• writing excessively long alt text that would be considered spammy
• using only image links for your site’s navigation
• Store images in a directory of their own – Instead of having image files spread out in numerous directories and subdirectories across your domain, consider consolidating your images into a single directory (e.g. brandonsbaseballcards.com/images/). This simplifies the path to your images.
• Use commonly supported filetypes – Most browsers support JPEG, GIF, PNG, and BMP image formats. It’s also a good idea to have the extension of your filename match with the filetype.
Make effective use of robots.txt
• Use more secure methods for sensitive content – You shouldn’t feel comfortable using robots.txt to block sensitive or confidential material. One reason is that search engines could still reference the URLs you block (showing just the URL, no title or snippet) if there happen to be links to those URLs somewhere on the Internet (like referrer logs). Also, non-compliant or rogue search engines that don’t acknowledge the Robots Exclusion Standard could disobey the instructions of your robots.txt. Finally, a curious user could examine the directories or subdirectories in your robots.txt file and guess the URL of the content that you don’t want seen. Encrypting the content or password-protecting it with .htaccess are more secure alternatives.
• allowing search result-like pages to be crawled (users dislike leaving one search result page and landing on another search result page that doesn’t add significant value for them)
• allowing a large number of auto-generated pages with the same or only slightly different content to be crawled: “Should these 100,000 near-duplicate pages really be in a search engine’s index?”
• allowing URLs created as a result of proxy services to be crawled