Jez Higgins

Freelance software generalist
software created
extended or repaired


Older posts are available in the archive or through tags.

Feed

Follow me on Twitter
My code on GitHub

Contact
About

Thursday 19 February 2004 Arabica: The XPath grammar is, modulo whitespace, complete.

I can't claim to have an exhaustive test suite, but it eats all the arbitrarily complex XPaths I've thrown at it today.

What I find amazing is the speed with which I've been able to do this. I've spent less than two working days on it - two hours transcribing EBNF from the XPath rec, half a day footling around with abstract syntax trees and what not, and the rest of the time eliminating left-recursion. Now I have a grammar that I'm extremely confident is correct.

That development speed and confidence is entirely due to the mind-boggling power of the Spirit library. By allowing you to transcribe EBNF more or less directly into code, I can take the grammar in the recommendation and codify it. Literally codify it. The rec text is there in my code, so the code must be right. Here's snippet

// [1]
LocationPath = RelativeLocationPath | AbsoluteLocationPath; 
// [2]
AbsoluteLocationPath = AbbreviatedAbsoluteLocationPath
                     | ('/' >> !RelativeLocationPath);
// [3]
RelativeLocationPath = Step >> *((boost::spirit::str_p("//") | boost::spirit::ch_p('/')) >> Step);

// [4], [5]
Step = AxisSpecifier >> NodeTest >> *Predicate | AbbreviatedStep;
AxisSpecifier = AxisName >> "::" | AbbreviatedAxisSpecifier;
The numbers in square brackets refer to the rules in the recommendation -
[1] LocationPath ::= RelativeLocationPath | AbsoluteLocationPath	
[2] AbsoluteLocationPath ::= '/' RelativeLocationPath?
		| AbbreviatedAbsoluteLocationPath	
[3] RelativeLocationPath ::=   	Step	
		| RelativeLocationPath '/' Step	
		| AbbreviatedRelativeLocationPath	
[4] Step ::= AxisSpecifier NodeTest Predicate*	| AbbreviatedStep	
[5] AxisSpecifier ::= AxisName '::' | AbbreviatedAxisSpecifier
Even without knowing the Spirit syntax, it's easy to see the two match very closely. You can see I've had to reorder some rules slightly. RelativeLocationPath is an example of left-recursion, which I've had to refactor. But differences are minor and pale into nothing compared with a hand-coded parser. (I was going to link to Xalan's XPath grammar and I can't actually find it, which kind of demonstrates the point.)
Tagged code, arabica, xml, and c++


Jez Higgins

Freelance software generalist
software created
extended or repaired

Older posts are available in the archive or through tags.

Feed

Follow me on Twitter
My code on GitHub

Contact
About