Balisage 2012 – just a week to go

[31 July 2012]

Balisage 2012 is just a week away. Next Monday there is the pre-conference symposium on quality assurance and quality control in XML systems, and a week from today the conference proper starts.

I’m looking forward to pretty much all of the papers on the program, so it’s kind of hard to pick any out for particular mention. And yet, unless I want to just reproduce the program for the conference, I’m going to have to.

Several papers this year deal, one way or another, with the relation of XML and JSON. Some talk about JSON support in XML tools, some about simplifying XML so it has more appeal to the kind of person who finds JSON attractive. Hans-Jürgen Rennau has a different take: he proposes a modest generalization of the XDM data model (which underlies XPath, XSLT, and XQuery and is as close as anyone is likely to come to being the consensus model for XML) which makes the existing XDM and JSON models each a specialization of the more general model. Since XPath, XQuery, and XSLT work on XDM instances, not on serialized data, they then apply without contortions both to XML and to JSON. (Of course, they need a few modest extensions to cover the new data model, too.)

Changing the underlying data model for a technology is hard, of course, but it’s not impossible (SQL has done so, at least in some ways, and that’s one reason for its longevity). I think Rennau’s proposal merits serious discussion. It’s certainly one of the most far-reaching papers at this year’s conference.

Several talks address the relation of XML and non-XML notations for languages, and I’m looking forward to the discussions that that thread of the conference elicits. David Lee, now with MarkLogic, considers what life would be like if we marked up structure in programming languages the way we mark it up in documents. Norm Walsh continues the thread with a discussion of the general issue with particular reference to possible designs for a ‘compact syntax’ for XProc. Mark D. Flood, Matthew McCormick, and Nathan Palmer approach the problem complex from a different and enlightening angle, that of literate programming, in their case literate programming for the development of test cases for scientific function libraries. Mario Blažević offers the latest entry in the ongoing series of papers exploring how to do things with XML that were (in some form or other) part of SGML but were dropped when XML was designed. His paper shows how we might do SHORTREF in an XML context in a more general and more reliable way than was achieved when SHORTREF was bundled into SGML. And finally, Sam Wilmott opens the entire series of talks with a case study and general reflections on literate programming. I look forward to Wednesday at Balisage!

As is customary at Balisage, a few papers approach resolutely theoretical topics, either with or without overt practical applications. I’ll mention just a few: Hervé Ruellan of Canon discusses a long series of careful measurements of entropy in various data structures for XML; his paper feels in some ways like the theoretical underpinnings I wish the Efficient XML Interchange working group had had at the beginning of its work. Abel Braaksma describes the use of higher-order functions as a way to simplify XSLT stylesheet development. And Claus Huitfeldt, Fabio Vitali, and Silvio Peroni have produced a response to the paper presented in 2010 by Allen Renear and Karen Wickett of the University of Illinois claiming that documents (as we conventionally try to formalize them) do not exist. Huitfeldt and his co-authors explore the possibility of viewing documents as ‘timed abstract objects’.

Theory, practice, practice, and theory. I look forward to seeing you at Balisage.

Balisage 2012 – T minus 21 days

[16 July 2012]

Hard to believe, but Balisage 2012 is only three weeks away.

On Monday 6 August there is a pre-conference symposium on quality assurance and quality control in XML. I won’t list all the scheduled talks here, but the symposium program has a good balance of theory and practice, abstract rule and concrete application, and there are several case studies from organizations with major XML publishing programs (Ontario Scholars Portal, the U.S. National Library of Medicine’s National Center for Biotechnology Information, the American Chemical Society, and Portico).

Tuesday through Friday, the conference proper will take place. Among the many talks I am looking forward to, today I’ll mention just a few.

Mary Holstege opens the conference with a talk about type introspection in XQuery; as a principal engineer at MarkLogic, she has a deep background both in the technology of XQuery and related specifications and good understanding of how real customers with large amounts of textual data actually use XML.

Later the same day, Steven Pemberton of W3C will speak on the relation between data abstractions and their serializations, with (passing) reference to work on XForms 2.0. Steven gives dynamite talks, and I want to hear how he describes the interplay of general design problems with the concrete work of spec development.

And at the other end of the week, Friday morning Liam Quin (also of W3C) will talk about work he has been doing to characterize the body of material served as XML on the Web, in particular that part of it which is not actually well-formed XML (and thus, in the strict sense, not XML at all). Since sometimes people use the existence of non-well-formed data on the Web to support arguments that XML’s well-formedness rules are too strict for practical use, I look forward to hearing Liam’s analysis.

Of course, there is a lot more to look forward to. I hope, dear reader, that I will see you in Montreal next month!

XSLTForms 1.0RC, subforms, and a 50% speedup

[9 July 2012]

A couple of weeks ago, I took some time to explore the use of sub-forms in XSLTForms, as a possible way to speed up an XForm I had written that was a little slower than I would have liked.

The short version of the story is: WOW! Well worth learning to use.

To understand the longer version, dear Reader, you should know that one of the most common performance issues in serious uses of XForms is that forms sometimes slow down when the instance documents they are working on get big. I assume this is because browsers are profligate with resources, perhaps because some aspects of the XML DOM force them to be, perhaps because profligacy pays off most of the time. But I can’t say I really know for sure.

So one of the things that sophisticated users of XForms spend a lot of time on is finding ways to avoid loading all the instance documents at once. (This is a lot easier when you’re using an XML database as a back end, of course.) Another is finding ways to avoid loading all of the form at once; that is where sub-forms come in. The word doesn’t occur in the XForms 1.1 spec, but a number of implementations provide experimental support for sub-forms as an extension. The basic idea is that whenever certain events occur in a form, the XForms implementation will load some appropriate resource specifying some XForms widgets and bind them into the current form. When other events occur, those widgets will be unloaded again. I first saw this in a demo on the BetterFORM site a few years ago, but I see that Mark Birbeck was talking about this as long ago as 2006. And more recently, Alain Couthures has added sub-form support to XSLTForms.

Making my form use a sub-form turned out to be simpler than I had feared. I already had a full working version of the form, and it was clear which part of it I wanted to load and unload dynamically. What I had to do was just:

  1. Move the part of the form that should load dynamically (which I’ll now call the subform) into a separate XHTML + XForms document. Give it a simple XForms model, and check to make sure that it works by itself. (It doesn’t actually have to work by itself, but it’s helpful to know the subform hasn’t got fatal errors on its own.)
  2. In the main form, put an XForms xf:group where the sub-form used to be; give that group an ID.
  3. Associate a load action with the appropriate event. (In my form, I had a trigger that toggled a switch, exposing the read/write view of some material. The sub-form now has that read-write view, and the trigger now throws a load action.)
  4. Associate an unload action with another appropriate event. (In my form, this was the trigger that formerly toggled the switch back to the read-only view.)

In principle, that’s it, though I had to fiddle a bit to make everything work right. In particular I ended up adding a ref="." attribute to the outermost xf:group in the sub-form. I’m not yet sure just when this is necessary and when it’s not.

The simple example of sub-forms loading on the XSLTForms web site is very helpful here: it’s a very simple example and illustrates all the moving parts clearly. (But you will need to read the source and think about what is going on; there isn’t a lot of commentary or documentation around.)

What really impressed me were the effects of this change on the performance of the form.

Since sub-form support was added fairly recently to XSLTForms, I had to upgrade from an older release of XSLTForms to the recent release 1.0RC. I did some fairly tedious timings before and after I made the change, and I can say with some evidence that this change alone gave my form about a 25% increase in speed. Then I made the changes mentioned above, to use sub-forms. That gave me another 25% increase, so that on almost all actions version 1.0RC using sub-forms was about twice as fast as the older version Beta3 using a monolithic form.

Moral 1: If you are having performance issues with an XForm, and you can see how you might use a sub-form, then try it.

Moral 2: If you are having performance issues with an XForm, and you are using XSLTForms, then try moving to 1.0RC even if you can’t see how to use a sub-form in your context. Alain Couthures has done a lot of work on performance, and it clearly helps.

Bear in mind that the precise syntax and semantics of sub-forms are a topic of discussion in the XForms working group, so (a) they are subject to change, and (b) the working group is open to suggestions for making sub-forms (or any other part of XForms) work better.