Modularity in Invisible XML

Norm Tovey-Walsh

Saxonica, Limited

The First International Symposium on Invisible XML

Introduction

Table of Contents

Why modularity?

Reuse.

  • Reuse is a well established software engineering practice.

  • Invisible XML grammars are software engineering.

  • Therefore, Invisible XML needs a reuse mechanism.

There are lots (and lots) of grammars that only need to be defined once and then reused: ISBN numbers, IPV6 addresses, URIs, credit card numbers, etc., etc., etc.

Background

  • Steven Pemberton introduced a modularity proposal at MarkupUK 2025. Some of the syntax choices in this proposal are inspired by Steven’s choices.

  • Sheila Thomson reminded me that (a) I opened the original issue about reuse and that (b) Michael Sperberg-McQueen commented on it.

  • Michael’s comments point to RELAX NG as a potential model and I think he was absolutely correct.

Requirements

But first, a principle

The author writing the grammar that’s doing the including is in control.

Requirements

  1. A grammar author must be able to include rules from one or more other grammars.

Requirements (cont.)

  1. A grammar author must be able to define a public interface which specifies the nonterminals they expect to share.

Requirements (cont.)

  1. It should be convenient to include all of the rules that are in the public interface and no others.

Requirements (cont.)

  1. It should be possible to include any rule from a grammar. Ultimately, the including grammar is in control.

Requirements (cont.)

  1. Including a nonterminal from another grammar does not change its definition.

Requirements (cont.)

  1. It must, however, also be possible to redefine any nonterminal in the included grammar.

Requirements (cont.)

  1. It must be possible to disambiguate the names of included nonterminals.

(It must be possible to include the “digit” nonterminal from bodyparts.ixml and the “digit” nonterminal from numbers.ixml into the same including grammar.)

A (conceptual) example

sentence.ixml

sentence = "The", animal, verb, pphrase, punct .
  animal = -" ", ("cat" | "bat" | "rat") .
    verb = -" ", ("sat" | "spat") .
 pphrase = prep, " the mat" .
    prep = -" ", ("in" | "on" | "at") .
   punct = "." | "!" .

“The cat sat on the mat.”

“The bat spat on the mat!”

etc.

book.ixml

   book = chapter++(-#a, -#a), -#a* .
chapter = number, -". ", para++-#a .
   para = sentence++" " .
@number = [N]+ .

This grammar parses books like this one:

1. The cat sat on the mat.
The rat spat at the mat.
 
2. The bat sat in the mat.

Conceptually…

Modularity operates on the grammar models, not on their surface syntax.

Some syntax

Syntax examples

Declare your public interface with +share:

+share foo, bar, baz

If you don’t declare one, it defaults to the whole grammar.

include all the public nonterminals with +include:

+include "sentence.ixml"

The included grammar must be a valid iXML grammar in its own right.

Syntax examples (cont.)

Selectively include nonterminals:

+include animal, verb from "sentence.ixml"

Syntax examples (cont.)

Rename included nonterminals:

+include animal from "sentence.ixml"
+include animal as muppet from "muppets.ixml"

Syntax examples (cont.)

Redefine included nonterminals:

+include sentence from "sentence.ixml" (
  animal |= -" ", "gnat" .
  pphrase = prep, object .
  object = -" ", "the", (animal | thing) .
)
-S = sentence.
thing = -" ", ("mat" | "hat" | "vat") .

“The gnat sat in the vat!”

(There are some additional constraints about reachable and unreachable nonterminals.)

sentence = "The", animal, verb, pphrase, punct .
  animal = -" ", ("cat" | "bat" | "rat") .
    verb = -" ", ("sat" | "spat") .
 pphrase = prep, " the mat" .
    prep = -" ", ("in" | "on" | "at") .
   punct = "." | "!" .

Open questions

Open question: template grammars

Consider this grammar:

+share table
 
table = row+ .
 
row = (-'|', cell)+, -'|', nl .
cell = cell-content .
-nl = -#a .

If cell-content is ["x"|"o"], then this grammar parses tables like this one:

|x|o|x|
|o|o|x|
|x|o|x|

But it doesn’t work.

Open question: template grammars

Allow grammars to declare redefined nonterminals?

+require cell-content
+share table
 
table = row+ .
 
row = (-'|', cell)+, -'|', nl .
cell = cell-content .
-nl = -#a .

Open question: multiple includes

Can you include the same grammar more than once? If you redefine parts of it?

+include table as word-table from "table.ixml" (
  cell-content = [L]+ .
)
+include table as number-table from "table.ixml" (
  cell-content = ["0"-"9"]+ .
)
 
page = (para | word-table | number-table) ++ nl, nl? .
para = [L | " "]+, nl .

How does my proposal differ from Steven’s?

In brief:

  • Any nonterminal can be included, even one’s not explicitly shared.

  • Nonterminals can be renamed when they’re included.

  • Nonterminals can be extended and redefined when they’re included.

  • Template grammars.

  • Some syntactic changes.