Joel on the (old) Office File Formats

Joel on Software: Why are the Microsoft Office file formats so complicated? (And some workarounds)

I’ll show you how those file formats got so unbelievably complicated, why it doesn’t reflect bad programming on Microsoft’s part, and what you can do to work around it.
 
The first thing to understand is that the binary file formats were designed with very different design goals than, say, HTML.
 
They were designed to be fast on very old computers. For the early versions of Excel for Windows, 2 MB of RAM was a reasonable amount of memory, and an 80386 at 20 MHz had to be able to run Excel comfortably. There are a lot of optimizations in the file formats that are intended to make opening and saving files much faster

They were designed to use libraries. If you wanted to write a from-scratch binary importer, you’d have to support things like the Windows Metafile Format (for drawing things) and OLE Compound Storage. If you’re running on Windows, there’s library support for these that makes it trivial… using these features was a shortcut for the Microsoft team. But if you’re writing everything on your own from scratch, you have to do all that work yourself.

They were not designed with interoperability in mind. The assumption, and a fairly reasonable one at the time, was that the Word file format only had to be read and written by Word. That means that whenever a programmer on the Word team had to make a decision about how to change the file format, the only thing they cared about was (a) what was fast and (b) what took the fewest lines of code in the Word code base.

They have to reflect all the complexity of the applications. Every checkbox, every formatting option, and every feature in Microsoft Office has to be represented in file formats somewhere.

They have to reflect the history of the applications. A lot of the complexities in these file formats reflect features that are old, complicated, unloved, and rarely used. They’re still in the file format for backwards compatibility, and because it doesn’t cost anything for Microsoft to leave the code around. But if you really want to do a thorough and complete job of parsing and writing these file formats, you have to redo all that work that some intern did at Microsoft 15 years ago. The bottom line is that there are thousands of developer years of work that went into the current versions of Word and Excel, and if you really want to clone those applications completely, you’re going to have to do thousands of years of work. A file format is just a concise summary of all the features an application supports.

Follow the link and read the whole pice.

Tagged as: , , , | Author:
[Mittwoch, 20080220, 01:02 | permanent link | 0 Kommentar(e)

Comments are closed for this story.


Disclaimer

„Leyrers Online Pamphlet“ ist die persönliche Website von mir, Martin Leyrer. Die hier veröffentlichten Beiträge spiegeln meine Ideen, Interessen, meinen Humor und fallweise auch mein Leben wider.
The postings on this site are my own and do not represent the positions, strategies or opinions of any former, current or future employer of mine.

Search

RSS Feed RSS Feed

Tag Cloud

2007, a-trust, a.trust, a1, accessability, acta, advent, age, amazon, ankündigung, apache, apple, audio, austria, backup, barcamp, bba, big brother awards, birthday, blog, blogging, book, books, browser, Browser_-_Firefox, bruce sterling, buch, bürgerkarte, cars, cartoon, ccc, cfp, christmas, cloud, collection, computer, computing, concert, conference, copyright, database, date, datenschutz, debian, delicious, demokratie, design, desktop, deutsch, deutschland, dev, developer, digitalks, dilbert, disobay, dna, dns, Doctor Who, documentation, domino, Domino, Douglas Adams, download, drm, dsk, dvd, e-card, e-government, e-mail, e-voting, E71, Ein_Tag_im_Leben, email, eu, event, exchange, Extensions, fail, feedback, film, firefox, flightexpress, food, foto, fsfe, fun, future, games, gaming, geek, geld, gleichberechtigung, google, graz, grüne, grüninnen, hack, hacker, handtuch, handy, hardware, HHGTTG, history, how-to, howto, hp, html, humor, ibm, IBM, ical, image, innovation, intel, internet, internet explorer, iphone, ipod, isp, it, IT, java, javascript, job, journalismus, keyboard, knowledge, konzert, language, laptop, law, lego, lenovo, life, links, Linux, linux, linuxwochen, linuxwochenende, living, lol, london, lost+found, lotus, Lotus, lotus notes, Lotus Notes, LotusNotes, lotusnotes, lotusphere, Lotusphere, Lotusphere2006, lotusphere2007, lotusphere2008, Lotusphere2008, lustig, m3_bei_der_Arbeit, mac, mail, marketing, mathematik, media, medien, metalab, Microsoft, microsoft, mITtendrin, mobile, mood, movie, mp3, multimedia, music, musik, männer, nasa, netwatcher, network, netzpolitik, news, nokia, Notes, notes, Notes+Domino, office, online, OOXML, open source, openoffice, opensource, orf, orlando, os, outlook, patents, pc, pdf, perl, personal, php, picture, pictures, podcast, politics, politik, pr, press, presse, privacy, privatsphäre, productivity, programming, protest, qtalk, quintessenz, quote, quotes, radio, rant, recherche, recht, release, review, rezension, rss, science, search, security, server, sf, shaarli, Show-n-tell thursday, sicherheit, silverlight, SnTT, social media, software, sony, sound, space, spam, sprache, spö, ssh, ssl, standards, storage, story, stupid, summerspecial, sun, sysadmin, talk, talks, technology, The Hitchhikers Guide to the Galaxy, theme, think, thinkpad, tip, tipp, tools, topgear, torrent, towel, Towel Day, TowelDay, travel, truth, tv, twitter, ubuntu, uk, unix, update, usa, vds, video, videoüberwachung, vienna, Vim, vim, vista, vorratsdatenspeicherung, vortrag, wahl, wcm, web, web 2.0, web2.0, web20, Web20, webdesign, werbung, wien, wiener linien, wikileaks, windows, windows 7, wired, wishlist, wissen, Wissen_ist_Macht, wlan, work, wow, wtf, Wunschzettel, wunschzettel, www, xbox, xml, xp, zensur, zukunft, zune, österreich, övp, übersetzung, überwachung

AFK Readinglist