Skip navigation

Monthly Archives: August 2009

So this morning, waiting to start work, I cruised through the sites I can actually visit at work, and came upon this article on cNet.

The headline is alarming…ly funny. Back in 1998, this tiny company out of Toronto called i4i patented a most interesting method of data storage that, it seems, Microsoft uses (allegedly) in Word 2003 and later for its XML export.

The news reports I’ve seen so far make a big deal about this, but I must point out, this has nothing to do with XML. Pure and simple. It’s also nothing like XML. At all. Two separate concepts. XML is a language, this is a mechanism for more efficient parsing.

The patent describes a “method and system for manipulating the architecture and the content of a document separately from each other.”

XML (eXtensible Markup Language) describes the structure of a document. It’s a human-readable language for rendering text-based data. The actual tagging that describes this structure can be whatever you want, but it has to follow a few very strict but simple rules in order to be parsed (read and interpreted) correctly by a machine. It’s a subset of SGML (Standard Generalized Markup Language), which was developed years ago to provide a general way of noting structural data in the text stream of a document.

Which brings me to the need to explain why this patent is so full of awesome. To me, anyway. And maybe also to give an idea why I think Microsoft might have exploited this for more than just XML exporting, for which it’s perfect.

Consider an average document containing, say, colored text. What you see in the previous sentence is exactly what the colored words in the sentence describe: Text that is of a certain color. In order for that color information to survive quitting the program running the document, the information about the color has to be preserved somehow. How that happens in HTML is that there are tags placed before and after the words in question. So, <span style=”color: #ff6600;”>colored text.</span> is how the information is saved in an HTML-formatted text file.

When you want to open the document, the program doing the opening has to parse (read and interpret) the text in order to find any instances of, say, colored text, in order to render it properly. So, it has to literally pick up each letter and space in the document and check each letter and space to find tagging. in this case it’s looking for text that begins with a less-than character, <, and ends with a greater-than character, >. Whatever falls in between those characters is considered “tagging,” and is then dealt with by the program. Incidentally, if it so happens that you want to use a < or a > in your document, the program has to store those as &lt; or &gt;, so the heft of the file gets greater and greater. This is why plain-text is so much smaller than its styled counterparts: All the extra heft is to save this kind of extra information. It gets even more involved when you start considering styles.

This is me getting real elementary kids, so forgive me. I’m not sure how much you know.

Right now, we’re talking about a document with colored text. If that colored text is supposed to mean something–like it’s supposed to denote the beginning of a chapter–then other tagging exists that would need to be parsed for in order to denote structure. That’s why XML came into being: A simple, robust (mostly) way of rendering structure. But that’s all that XML does.

So what does all this have to do with the patent, and where pray tell is the awesome?

The bitch of opening a document with style and/or structure is the parsing. When you open a document, whatever the format, the parsing step happens first. For DOC files and other such proprietary formats, the mechanism and the file structure are optimized as much as possible (hence the proprietary part) to make that parse happen quick. For open standards like HTML or XML (or semi-open standards like Rich Text Format (RTF)), there are really two operations that happen. The first one is to chug through the file character by character to find the tagging and the text–which takes forever from a programming standpoint–then to load that information into whatever internal model you’re using for navigation and manipulation.

This patent describes a way of structuring a file so that the tagging does not have to lay in the text stream with the text. Literally, instead of having to chug-chug-chug through a file to find the tags and then create your internal map to keep track of where the tagging is in the text, the map is kept in a separate location in the file from the text. For example, in a document that reads…


The text portion of the document would just be


A nine-character chunk of data like you’d find in a plain text file. The map would basically say that at character zero (before the first letter in this case) the tag <span style=”color: #ff6600;”> would exist, and at character ten (after the exclamation point), the tag </span> would exist.

The beauty of this is that, without having to add any special characters to the text stream itself, or any extra characters at all, you can map out all kinds of descriptive information related to this chunk of plain text. You could, conceivably, put mapping in the document that tells us that at character zero would be a structure tag like <para> and at character ten another structure tag like </para>.

That’s where the XML wrinkle comes in. What the suit alleges is that, when Microsoft (fucking finally) added an XML export to Word in 2003, they basically used either i4i’s engine or their concept in making that export process happen. As this cat states, this method of data storage is a “logical thing to do.”

Which brings me further into the awesome. This mapping method is pretty ingenious, and frankly, it’s how I thought Word’s file structure actually worked in the first place starting back in Word 2000. A lot of stuff changed at that time, you see, and the Document Object Model (part of the apparatus to programmatically navigate Word and its documents) changed especially. Scanning through a DOC file using a hex editor (a tool to actually look at a file as it exists as raw data), I remember noticing that the text part of the DOC seemed to be a block of essentially plain, unformatted text.

Now I remember thinking that it was odd that there should be this block of plain text in the middle of the DOC file. There’s really no good reason why such a thing should be there, because here I was thinking the text content of the DOC was embedded somewhere in all the other mess that is a DOC file in its true form.

Reading the patent presentation this morning, it clicked. That parsing mechanism would work nice for XML, sure. Or HTML or SGML or whatever, really. If I can tell you one type of tag is somewhere in a document, why not another type of tag? That’s nothing.

But why can’t I use that same mapping method to tell me where activedocument.paragraph(1) begins? Or to otherwise make the job of populating the Word Document Object Model easier and faster?

Just a thought. One that for me contained a lot of awesome.

Note: Two relics of my past life still fascinate me to this day: XML (and relating structure in a document), and automating Photoshop. One day, maybe I’ll share more about the other one.


A few weeks back, I saw a link on the wonderful Pharyngula site about an outfit called Good News Magazine. Apparently they are advertisers on the equally wonderful, which is good only if they pay their bills promptly.

So I ordered a copy of the free booklet they offered, Creation or Evolution, and it just arrived today. And it’s been no disappointment thus far. It begins with how the Bible was once “commonly accepted as true and as a reliable account of our origins.” Whatever. Not really so, but whatever. It follows with a lengthy quote from Wernher von Braun about his views on the origin of the universe. Basically he’s there because he believes “one cannot be exposed to the law and order of the universe without concluding that there must be design and purpose behind it all.” He’s introduced as one “who has been called the father of the American space program.” Which immediately qualifies him as having a viewpoint in this discussion, apparently. We leave out the other less savory aspects of his curriculum vitae (look him up, please, if you don’t know who he is. I’ll wait.) and go straight to the end of this generous quote: “What strange rationale makes some physicists accept the inconceivable electron as real while refusing to accept the reality of a Designer on the ground that they cannot conceive of Him?”

Static electricity when you pull on a wool sweater would be my first guess.

Even better than that is the picture of the cute little baby nestled beside the von Braun quote. Below this little darling playing with his/her feet like a blue-eyed monkey is a fascinating quote, “If we are the pinnacle of the evolutionary process, why is a human infant so helpless, and for so long, compared to the newborn of other species?” Well, who said we were the pinnacle of the evolutionary process? Science sure doesn’t. Nor does a cursory examination of human anatomy, really. Humans are admittedly complex, but hardly the pinnacle of evolution.

Aaah. Anyway, we get to another cute section, “Human reproduction argues against evolution.” It starts, “Curiously enough, our existence as human beings is one of the best arguments against it [evolution]. According to evolutionary theory, the traits that offer the best advantage for survival are passed from generation to generation. Yet human reproduction itself argues powerfully against this fundamental premise of evolution.

“If human beings are the pinnacle of the evolutionary process [there it is again!] how is it that we have the disadvantage of requiring a member of the opposite sex to reproduce, when lower forms of life–such as bacteria, viruses and protozoa–are sexless and far more prolific? If they can reproduce by far simpler methods, why can’t we? If evolution is true, what went wrong?”

Oh that’s too funny, that last sentence. Again, they proceed from a false assumption. Well, several. One, there is the assumption that humans are the pinnacle of evolution. As above, a cursory examination of our anatomy is a good place to start. We’re not the pinnacle of evolution. There’s no ladder up from lowest to highest. We’re complex, and like most all complex forms of life, we have sexual reproduction. It’s a much quicker way to create diversity in an environment, and thus perpetuate the species and the evolutionary process.

It goes on to blame evolution (of course) for all the ills of the world, and how “court decisions have interpreted constitutional guarantees of freedom of religion as freedom from religion–effectively banning public expression of religious beliefs and denying the country’s rich religious heritage.” (their emphasis, by the way…) Now here, of course, you can talk about how the courts have kept creation out of science class, or tried to keep religious symbols out of public places (with unfortunately varying success. I submit, by the way, that what these people are talking about is not just those fights, but something a little more subtle. Hate speech.

They talk about the notion that “the world languishes in the sorrow and suffering that results from rejecting absolute moral standards.” Absolute anything is never a good thing. It’s that all-or-nothing thinking that twelve-step programs describe as unhealthy. “We might as well seek only our personal gain regardless of the cost to others–acting exactly as evolutionary theory suggests.” Does it really suggest that? Or, more accurately, does it describe how you would act without such rules? If so, let me know so I can keep kids and old ladies from coming near you.

There’s other stuff, all wonderful. A gold mine of lunacy dressed in serious clothing. It would perhaps be convincing to someone who never spent a day in the classroom. Or out in nature.

I just wanted to pass this on in any case. I’ll do more, but I wait for the view of others who will gaze upon this goofy thing and pass on their viewpoints. A lot of what I’ve seen so far is disproven elsewhere, or betrays a tremendous lack of understanding of science, natural selection and evolution.

So I had some time off today, and I figured I’d go get my eyes checked. My vision has been getting blurry a bit in the last twenty years. My left eye was always a bit blurry at a distance, and my right eye has compensated.

At work however, I find I can’t make out fine detail at a distance like I should. Also, I can’t read things written on the whiteboard from, say, fifteen feet back like I used to. Doesn’t help when the jerk writing on the board uses red and writes small. You know who you are, sir!

Anyway, I can’t read road signs either like I used to, and that concerns me more than anything else. All these years I tried to compensate, but I cannot do so anymore. I really need to get this looked at.

So, I went to the eye place here at the mall. Free wireless here, so I’m writing. While I waited to talk to someone, I looked over the glasses selection. I found styles I liked, as well as the Michael Caine-looking glasses that are just ridiculous. Fuck!

What hit me were two things. One, I look more and more like my dad, especially with glasses on. And he looks like a little Dutch shopkeeper. At least he did when last I saw him. I put on some of the samples, and Jesus there he was staring back at me, only balder and rounder in the phiz.

Two, I look old! Oh god, there’s no way around it, is there? I shave my head because, well, it’s really easy and my head looks fine without the hair, but also, a leetle bit, because I am not reminded of age. Oh I can ignore the wrinkles ’round the eyes and the white in the vandyke and that I have a pelican neck, but the fucking glasses! Christ! I stopped really thinking about it at thirty, excpet I look better than I did when I was thirty. But the fucking glasses! Oy!

Oh I know it’s inevitable and to everything there’s a season (Ecclesiastes is one of the two books of the Bible I actually liked. That and Mark), but fuck you and fuck that. I think I was looking for something that gives character. All these frames did just that. Only I didn’t like the character they suggested.

I’m considering contacts, but the optical at work doesn’t cover them and I want to use that while I can. Soooo, its frames.

I am almost forty-one years old. I have never worn corrective lenses in my life. I’ve needed to, technically, for the last twenty years I think, but I’ve made it work. Looking up from the keyboard, I see the XXI of the Forever twenty-one store, the latest extension of our fetish with youth (another tangent to hit when the spirit comes to me, I’ll have to remember). I can make out the big XXI some forty feet away, and it’s not blurry but it’s not clear either.


Well, tomorrow is the appointment for the eye exam We’ll see what happens.