Joel on the (old) Office File Formats
Joel on Software: Why are the Microsoft Office file formats so complicated? (And some workarounds)
Ill show you how those file formats got so unbelievably complicated, why it doesnt reflect bad programming on Microsofts part, and what you can do to work around it.
The first thing to understand is that the binary file formats were designed with very different design goals than, say, HTML.
They were designed to be fast on very old computers. For the early versions of Excel for Windows, 2 MB of RAM was a reasonable amount of memory, and an 80386 at 20 MHz had to be able to run Excel comfortably. There are a lot of optimizations in the file formats that are intended to make opening and saving files much faster
They were designed to use libraries. If you wanted to write a from-scratch binary importer, youd have to support things like the Windows Metafile Format (for drawing things) and OLE Compound Storage. If youre running on Windows, theres library support for these that makes it trivial… using these features was a shortcut for the Microsoft team. But if youre writing everything on your own from scratch, you have to do all that work yourself.
They were not designed with interoperability in mind. The assumption, and a fairly reasonable one at the time, was that the Word file format only had to be read and written by Word. That means that whenever a programmer on the Word team had to make a decision about how to change the file format, the only thing they cared about was (a) what was fast and (b) what took the fewest lines of code in the Word code base.
They have to reflect all the complexity of the applications. Every checkbox, every formatting option, and every feature in Microsoft Office has to be represented in file formats somewhere.
They have to reflect the history of the applications. A lot of the complexities in these file formats reflect features that are old, complicated, unloved, and rarely used. Theyre still in the file format for backwards compatibility, and because it doesnt cost anything for Microsoft to leave the code around. But if you really want to do a thorough and complete job of parsing and writing these file formats, you have to redo all that work that some intern did at Microsoft 15 years ago. The bottom line is that there are thousands of developer years of work that went into the current versions of Word and Excel, and if you really want to clone those applications completely, youre going to have to do thousands of years of work. A file format is just a concise summary of all the features an application supports.
Follow the link and read the whole pice.
Tagged as: file format, knowledge, microsoft, office | Author: Martin Leyrer
[Mittwoch, 20080220, 01:02 | permanent link | 0 Kommentar(e)
Comments are closed for this story.