XPattern can handle nodes that are optional, repeating, or both.
A question mark indicates the node is optional.
A $title : I ? $author : B $year : P
would find the two first hits (but not the last one) in something like this:
<a href="...">First title</a>
by <i> First author </i>
<a href="...">Second title</a>
<a href="...">Third title</a>
by <i> Author </i>
and <i> Another Author </i>
A plus indicates a repeating node. There has to be at least one of them.
A $title : I + $author : B $year : P
would find the first and third hit in the HTML
example above, but not the second, as it has no author. The third hit would have two separate authors.
Optional repeating: *
An asterisk indicates that a node is both optional and repeating. That is, there can be zero or more of them.
A $title : I * $author : B $year : P
This would find all three hits in the HTML
Greediness: +? and *?
By default all repeated patterns are greedy, meaning that they match as much as possible. Sometimes it is desirable to match as little as possible instead. This can also be much more effective, especially with ANY, which can try to match the rest of the document, before backtracking to only a few nodes.
As an example
A $title : <span class="caps">ANY</span> * : B $year
would match one hit from the HTML
example above, namely the first title and the last year. This is probably not what you want. A non-greedy match solves this problem:
A $title : <span class="caps">ANY</span> *? : B $year
Now the ANY
matches a minimal set of nodes, that is the author(s), and the B will match the first year. This way we get three hits from the same HTML
example, each with a title and year that belong together.