How many attributes?

[9 March 2011]

Beginning programmers are often taught to avoid arbitrary limits in their software, but in reality it’s not unusual for programmers to write capacity limits into their software: fixed-length fields are often easier to deal with than variable-length fields. So we have fixed-length IP numbers, fixed-length addresses in most machines, and who knows what all else.

So I wasn’t surprised today when I found a hard limit to the number of attributes allowed on an XML element in an XQuery implementation I use and think highly of.

But every now and then, the fixed limit turns out to be too small. That’s one reason beginners are instructed to treat such limits with suspicion.

When I first learned about IP numbers, for example, I remember being told there were enough IP numbers for everyone in the world to have one, including all the people in China. That seemed an extravagant plenitude of IP numbers, partly because we didn’t really expect everyone in China to want one. We didn’t expect everyone in the U.S. to want one, either: at the time, the only computer anyone in the room used for network activities was a shared mainframe, so we needed far fewer than one IP number per person then: looking ahead to a time when people would want their microcomputers to be on the internet felt like being farsighted. We did not foresee (at least, I certainly didn’t) that individual machines without a human in attendance might also want IP numbers, so that the world would need more than one per human. So IPv6 became necessary. (If you don’t know what I’m talking about, don’t worry: sometimes limits that look reasonable at the outset turn out to be too low, especially if the system they are built into becomes highly popular.)

Memory addresses of 32 bits also seemed extravagant for a long time (that mainframe I worked on supported several hundred simultaneous users in a 24-bit address space; the actual hardware supported 31-bit addresses, but only specialized parts of the operating system used more than 24 bits). But nowadays more and more machines now are moving to 64-bit addresses.

So I also wasn’t surprised that the reason I learned about the hard-coded limit in the XQuery engine was that it turned out to be set too low. The software declined to handle some XML I need to work with for a client’s project. It turns out that current versions of the Unicode Character Database are available in XML, which makes my life a lot easier. But most of the many character properties defined in the Unicode Database are represented as attributes, which blew well past the limit of (are you sitting down? ready?) 32.

I was a little surprised by the actual value of the limit: 32 seems like a very low number for such a limit. Who in their right mind (I found myself thinking) would expect a limit like that to suffice for real work? Who, indeed? But upon reflection, I remembered that I’ve been using this XQuery engine without incident for at least two or three years, doing a good deal of real work. And all the XML I’d dealt with in that time came in under the limit. So while 32 seems like a low limit, it seems to have been fairly well chosen, at least for the work I’ve been doing.

The happy ending came when I wrote to the support list asking if there were some way to change the configuration to lift the limit. Less than ten minutes later, a reply was in my inbox (from the indefatigable Christian Grün, who must work very late hours to have been at his desk at that time of the night) saying that in current versions of the software (BaseX, for those who are curious, one of a number of excellent XQuery implementations available today), the limit has been changed to 231, so now it can handle elements with a little over two billion attributes. (Hmm. Will that do? Well, let’s put it this way: if I wanted to experiment with a restructuring of the Unicode database that had one element per character property or property value, and a boolean attribute for each character indicating whether that character had that property [or that value for the property], the software could handle that many attributes. Actually, it could handle about a thousand times that many.

Moral: it’s not necessarily an error when software has a fixed capacity limit. But as a user, you normally need to take care that the limits are appropriate to your needs.

Moral 2: when you do bump your head on a limit of this kind, it’s very handy if those responsible for the software are responsive to user queries. Even better, of course, if they turn out to have fixed the problem before you asked about it.