Repetition

Norm Tovey-Walsh

Invisible XML currently has two styles of repetition, * meaning “zero or more” and + meaning “one or more”. These are extended to ** and ++ where a separator is introduced.

"a"*

Zero or more “a” characters.

"a"+

One or more “a” characters.

"a" ** ","

Zero or more “a” characters separated by a “,” character.

"a" ++ ","

One or more “a” characters separated by a “,” character.

This proposal adds the ability to have bounded repetition: at least “m” (m≥0) occurrences, and at most “n” (n≥m, n>0) occurrences. Stipulate that it is an error if “n” is less than “m” or if “n” is zero.

Simple bounded repetition is introduced with a pair of angle brackets: <m,n>. Bounded repetition with a separator uses doubled angle brackets: <<m,n>>.

"a"<3> (equivalently, "a"<3,3>)

Exactly three “a” characters.

"a"<3,5>

At least three “a” characters and at most five.

"a"<3,*>

Three or more “a” characters.

"a" <<3>> "," (equivalently, "a" <<3,3>> ",")

Exactly three “a” characters separated by a “,” character.

"a" <<3,5>> ","

At least three “a” characters and at most five, separated by the “,” character.

"a" <<3,*>> ","

Three or more “a” characters separated by the “,” character.

It is trivially the case that <0,*> is the same as *, <1,*> is the same as +, <<0,*>> is the same as **, and <<1,*>> is the same as ++ but there doesn’t seem to be a compelling reason to attempt to forbid these expressions.

Grammar changes

The following new grammar rules are added:

1 |      repeatn: factor, (-"<", s, min, s, (-",", s, max, s)?, -">", s;
  |                        -"<<", s, min, s, (-",", s, max, s)?, -">>", s, sep).
  |      -number: ["0"-"9"]+ .
  |         @min: number .
5 |         @max: number | "*" .

(It would be possible to constrain max to be greater than zero in the grammar, ("0"*, ["1"-"9"], ["0"-"9"]*) | "*", but it’s impractical to express the n≥m constraint, so I don’t think it’s worth the added complexity.)

The rule for factor is extended to include repeatn:

1 |        -term: factor;
  |               option;
  |               repeat0;
  |               repeat1;
5 |               repeatn.

Hints for implementors

A specific repeat:

  |f<m> ⇒ f₁, f₂, f₃ , ..., fₘ

A specific range:

  |f<m,n> ⇒ f₁, f₂, f₃ , ..., fₘ, (fₘ₊₁, (fₘ₊₂, ..., (fₙ)?, ... )? )?

That is, f<3,7> ⇒ f, f, f, (f, (f, (f, (f)?)?)?)?.

An unbounded range:

  |f<m,*> ⇒ f₁, f₂, f₃ , ..., fₘ, f*

A specific repeat:

  |f <<m>> sep ⇒ f, (sep, f)<m-1>

A specific range:

  |f <<m,n>> sep ⇒ f, (sep, f)<m-1,n-1>

An unbounded repeat:

  |f <<m,*>> sep ⇒ f, (sep, f)<m-1,*>