OpenXML and XSL:FO

I’m currently working on a project generating OpenXml documents. OpenXml is Microsoft’s new default file format for Office 2007. The format, in general, is a great step forward. For those who haven’t looked at it, it’s a combination of a packaging standard, and several app specific schemas, WordprocessingML, SpreadsheetML, etc. So, a “docx” (the new word extension) file is actually a Zip file, with several XML files inside it. It can also contain other embedded documents, such as images, in their original form. This opens up the opportunity to easily programmatically generate documents for Office without interacting with Office at all.

As I work with WordprocessingML, however, I wonder why they had to make it so odd. I’ve done a lot of work in the past with XSL:FO, the page layout markup language. I like it. It accomplishes the job well, and intelligently leverages other standards, particularly in it’s use of essentially the same property set as CSS.

CSS’s property attributes are easy to use. WordprocessingML’s approach is all element based, not attribute based. So, to get bold text you end up with something like this:

<w:p>
  <w:pPr><w:b/></w:pPr>
  <w:r>
    <w:t>Bold Text</w:t>
  </w:r>
</w:p>

The equivalent in XSL:FO would be:

<fo:block font-weight="bold">Bold Text</fo:block>

Both entirely workable solutions, but, add all that extra up over a long document, and try to write the XPath expressions to data mine it, and you’ll find the later much simpler. I’m sure there are logical explainations of MS’s approach that haven’t occurred to me, but those are my impressions.

Leave a Reply

Switch to our mobile site