Hi everyone, my name is George Perantatos, and I’m a Program Manager on the Web Content Management (WCM) features of Microsoft Office SharePoint Server (MOSS) 2007. I work on some of the web page authoring features in WCM that help you create, author, approve, and publish web pages in your web site. I want to describe a feature we’ve been working on that helps you create web pages from existing documents in your web site. We call this feature Smart Client Authoring.
What is Smart Client Authoring?
A lot of times when I show people the features we’ve built into MOSS that help you author web pages on a web site, they say, “Great! But…not all of my web pages start their lives out getting created in-place like this.” Often times, web pages begin in the form of a document. Someone writes a document, other people revise it, and the document ultimately gets published, printed, faxed, and archived. Often, it’s someone’s job to also take that document and somehow make it into a web page. This potentially frustrating job involves manually copying and pasting content out of the document, into some intermediary tool to remove styling, and finally into the web page where styling is re-applied to match the web site. Whew! That’s a lot of steps, right? We thought there could be a better way.
Enter Smart Client Authoring (SCA). SCA allows you to take a document stored in a SharePoint library and convert it into a web page. Authors can easily do this on any document in SharePoint, provided it’s saved in a supported format (we’ll talk about that a bit later). Site managers can control how documents get converted into pages, including things like how the page is styled relative to the document.
It’s important to note the relationship between the document and the page in this model. A new page is created in the system when you opt to create a page from a document, and that page is a separate and distinct item from the source document (for example, it has its own versioning and workflow). However, this page is “tied” to the source document. What this means is that you can update the source document, add some new paragraphs, check it back in, and publish it, and then you can update the resulting page to match that new content. This “loose coupling” allows you to keep the document as the starting point for the content of your page, while still benefiting from the fact that your web page can be styled and published independently of your document.
Let’s start with an example. Here, I have a new press release I’ve been working on in Microsoft Word 2007.
When I save this press release into a document library in my MOSS site and I open up the drop-down menu on the document, I see an option called “Convert Document”. Hovering over this menu option, I see a flyout menu with “From Word Document to Web Page”.
Selecting this option takes me to a form that lets me pick where this page will be created, what its title, description, and URL name will be, and whether I want to wait for the page to be created now or in the background. When I click “Create” and wait a few moments, out comes a web page!
So, what just happened?
1. When I clicked “Create”, the document was sent down to the converter, along with some configuration information.
2. The converter took the document and transformed it into HTML.
3. The resulting HTML was handed back to the server.
4. The body and styles of the converted HTML were copied into a new page.
5. Finally, we were taken to the web page.
So what we have is a new, distinct web page from the document. But, how do you keep track of the two? Well, the system takes care of that for you by writing down which document is associated with this page, and vice-versa. To show that, let’s see what happens if you put this page in edit mode. Once you do that, you’ll see a handful of new options in the HTML editor:
· Open Source Document allows you to open the document that was used to create this page
· Update Contents from Source allows you to refresh the contents of this page by reconverting the source document
· Edit HTML lets you edit the contents of this page directly.
This shows that the page is aware that it was created from a document, and it helps you find, open, and edit the document, as well as refresh the contents of the page from the document.
So that’s how you use the feature. But how do you configure it? Well, SCA allows you to configure how documents are converted into web pages for a given content type. For example, if my document above was of the content type Press Release, I could configure how Press Release documents were converted into web pages separately from other documents. This is powerful, as you can configure one type of document in your web site to be converted in a specific place and use a specific layout, and have completely different settings for another type of document.
When you go to a Site Content Type in MOSS, you’ll see a link at the bottom entitled Manage document conversions for this content type.
If you click on that link, and then click Configure next to one of the enabled converters (like the one for .docx documents that we used above), you’re taken to a form that lets you configure how documents of this content type are converted into a web page (you can also get to this page by clicking on the Configure Converter Settings link at the top of the Create Page from Document form above, when you select a document to be converted).
There’s a couple of important questions a site manager must answer on this form. First, what page layout will be used to create pages from documents of this content type? This drives how pages will look when they’re converted from documents. In addition, which fields will the converted document’s <body> and <styles> sections be placed in? This is important, as you’ll need to place the content in fields that ultimately get rendered by the page layout.
For the styles section, you have the option of keeping or throwing away inline CSS style definitions that we find in the converted HTML. Throwing them away will remove any <styles> blocks from the converted HTML, which then allows you to define your own CSS file that specifies CSS classes that are used in the converted HTML. In the Word case, CSS classes are used to represent Word styles, such as Heading 1 and Heading 2. If you choose to “Remove CSS <styles>…”, then the CSS definitions are gone but the HTML still has those CSS class references. If your page layout or master page links in a CSS file that redefines those CSS classes, then boom! Your web page looks different than your document does. That’s how I was able to make the web page above use different styles than my Word document.
Unless you have documents that match the branding, fonts, and colors of your web site, chances are you want to remove these inline CSS definitions and redefine them using CSS classes that match your web site’s look and feel. This lets you keep your documents with a look and feel appropriate for printing, but make the web pages created from those documents fit with your web site’s overall styling.
Converters and extensibility
Out of the box, we offer 4 converters for converting documents to web pages:
· A Word .docx to Web Page converter
· A Word .docm to Web Page converter
· An InfoPath Form to Web Page converter
· An XML file to Web Page converter
The Word converters take a Word .docx or .docm file and transform them into HTML using an XSLT transform. The InfoPath converter relies on an InfoPath Form Template (.xsn) view to do the conversion (a view is just an XSLT file); this lets you create a “web view” of an InfoPath form, and then use that view to create web pages from your InfoPath forms. The XML converter asks for a corresponding XSL file to use when converting documents to HTML. Any app that can output XML and has a corresponding XSLT transform can be used with this converter.
While we think these are valuable converters, we don’t claim to have built all of the ones you’ll ever need to use. That’s why we made this system pluggable. If you have an executable that can run on the server, take a document of a given format, and output HTML from it, you can easily plug it into our system. We even give you the ability to host your own UI in our “Configure” form, if you need additional information to be asked of the user configuring your converter. You can read more about how to do plug in your own converter into SCA in our SDK documentation.
SCA is an easy way to take existing documents and convert them into web pages, manage how that conversion happens, and plug in your own converters for most any file type imaginable. I encourage folks who haven’t played with the feature to try it out on their MOSS Beta2 installations and give us feedback. As a side note, to get the feature working on MOSS Beta2:
· If you have a single-box installation, all you need to do is enable Document Conversions (Central Administration -> Application Management -> Document Conversions). Make sure to pick the web application that corresponds to where your documents are stored.
· If you have a farm installation, you’ll first need to enable a couple of services (Central Administration -> Operations -> Services on Server). First, enable the Document Conversion Load Balancer Service, and then enable the Document Conversion Launcher service. For Load Balancer, pick the current server.
I hope this information was useful to you, and I welcome comments and questions from you as you experiment with using SCA for your specific needs.
George Perantatos, Program Manager for WCM