Modularity in Invisible XML
Saxonica, Limited
Introduction
Why modularity?
Reuse.
Reuse is a well established software engineering practice.
Invisible XML grammars are software engineering.
Therefore, Invisible XML needs a reuse mechanism.
There are lots (and lots) of grammars that only need to be defined once and then reused: ISBN numbers, IPV6 addresses, URIs, credit card numbers, etc., etc., etc.
Background
Steven Pemberton introduced a modularity proposal at MarkupUK 2025. Some of the syntax choices in this proposal are inspired by Steven’s choices.
Sheila Thomson reminded me that (a) I opened the original issue about reuse and that (b) Michael Sperberg-McQueen commented on it.
Michael’s comments point to RELAX NG as a potential model and I think he was absolutely correct.
Requirements
But first, a principle
The author writing the grammar that’s doing the including is in control.
Requirements
- A grammar author must be able to include rules from one or more other grammars.
Requirements (cont.)
- A grammar author must be able to define a public interface which specifies the nonterminals they expect to share.
Requirements (cont.)
- It should be convenient to include all of the rules that are in the public interface and no others.
Requirements (cont.)
- It should be possible to include any rule from a grammar. Ultimately, the including grammar is in control.
Requirements (cont.)
- Including a nonterminal from another grammar does not change its definition.
Requirements (cont.)
- It must, however, also be possible to redefine any nonterminal in the included grammar.
Requirements (cont.)
- It must be possible to disambiguate the names of included nonterminals.
(It must be possible to include the
“digit” nonterminal from bodyparts.ixml and the “digit”
nonterminal from numbers.ixml into the same including
grammar.)
A (conceptual) example
sentence.ixml
sentence = "The", animal, verb, pphrase, punct .animal = -" ", ("cat" | "bat" | "rat") .verb = -" ", ("sat" | "spat") .pphrase = prep, " the mat" .prep = -" ", ("in" | "on" | "at") .punct = "." | "!" .
“The cat sat on the mat.”
“The bat spat on the mat!”
etc.
book.ixml
book = chapter++(-#a, -#a), -#a* .chapter = number, -". ", para++-#a .para = sentence++" " .@number = [N]+ .
This grammar parses books like this one:
1. The cat sat on the mat.The rat spat at the mat.2. The bat sat in the mat.
Conceptually…
Modularity operates on the grammar models, not on their surface syntax.
Some syntax
Syntax examples
Declare your public interface with +share:
+share foo, bar, bazIf you don’t declare one, it defaults to the whole grammar.
include all the public nonterminals with +include:
+include "sentence.ixml"The included grammar must be a valid iXML grammar in its own right.
Syntax examples (cont.)
Selectively include nonterminals:
+include animal, verb from "sentence.ixml"Syntax examples (cont.)
Rename included nonterminals:
+include animal from "sentence.ixml"+include animal as muppet from "muppets.ixml"
Syntax examples (cont.)
Redefine included nonterminals:
+include sentence from "sentence.ixml" (animal |= -" ", "gnat" .pphrase = prep, object .object = -" ", "the", (animal | thing) .)-S = sentence.thing = -" ", ("mat" | "hat" | "vat") .
“The gnat sat in the vat!”
(There are some additional constraints about reachable and unreachable nonterminals.)
sentence = "The", animal, verb, pphrase, punct .animal = -" ", ("cat" | "bat" | "rat") .verb = -" ", ("sat" | "spat") .pphrase = prep, " the mat" .prep = -" ", ("in" | "on" | "at") .punct = "." | "!" .
Open questions
Open question: template grammars
Consider this grammar:
+share tabletable = row+ .row = (-'|', cell)+, -'|', nl .cell = cell-content .-nl = -#a .
If cell-content is ["x"|"o"], then this grammar
parses tables like this one:
|x|o|x||o|o|x||x|o|x|
But it doesn’t work.
Open question: template grammars
Allow grammars to declare redefined nonterminals?
+require cell-content+share tabletable = row+ .row = (-'|', cell)+, -'|', nl .cell = cell-content .-nl = -#a .
Open question: multiple includes
Can you include the same grammar more than once? If you redefine parts of it?
+include table as word-table from "table.ixml" (cell-content = [L]+ .)+include table as number-table from "table.ixml" (cell-content = ["0"-"9"]+ .)page = (para | word-table | number-table) ++ nl, nl? .para = [L | " "]+, nl .
Concluding remarks
Concluding remarks
The full proposal is available, with test cases:
This presentation is also available:
The proposal is implemented in NineML version 3.3.7. (Which was released on Tuesday 🙂)
Steven’s proposal is also implemented.
(And so is my proposal for the repetition issue, but we’re not talking about that today.)
Thank you very much. Questions or discussion?
How does my proposal differ from Steven’s?
In brief:
Any nonterminal can be included, even one’s not explicitly shared.
Nonterminals can be renamed when they’re included.
Nonterminals can be extended and redefined when they’re included.
Template grammars.
Some syntactic changes.






