java - Match a single senerio with ANTLR and skip everything else as noise -
i defined simple grammar using antlr v4 eclipse plugin. want parse file contains coldfusion cfscript code, , find every instance of property definition. example:
property name="producttypeid" ormtype="string" length="32" fieldtype="id" generator="uuid" unsavedvalue="" default="";
that is, property keyword followed number of attributes, line terminated semicolon.
.g4 file
grammar cfproperty; property : 'property ' (atr'='string)+eol; // match keyword property followed attribute definition atr : [a-za-z]+; // match lower , upper-case identifiers name string: '"' .*? '"'; // match string ws : [ \t\r\n]+ -> skip; // skip spaces, tabs, newlines eol : ';'; // end of property line
i put simple java project uses generated parser, tree-walker etc printout occurrences of matches.
the input i'm testing is:
"property id=\"actionid\" name=\"actionname\" attr=\"actionattr\" hbmethod=\"hbmethod\"; public function funtion {//some text} property name=\"actionid\" name=\"actionname\" attr=\"actionattr\" hbmethod=\"hbmethod\"; \n more noise "
my issue matching:
property id="actionid" name="actionname" attr="actionattr" hbmethod="hbmethod";
and because doesn't understand everthing else noise, doesn't match second instance of property definition.
how can match on multiple instances of property definition , match on else in-between noise skipped?
you can use lexer mode want. 1 mode property , stuffs , 1 mode noise. idea behind mode go mode (a state) following token found during lexing operation.
to this, have cut grammar in 2 files, parser in 1 file , lexer in other.
here lexer part (named testlexer.g4
in case)
lexer grammar testlexer; // normal mode property : 'property'; equals : '='; atr : [a-za-z]+; // match lower , upper-case identifiers name string: '"' .*? '"'; // match string ws : [ \t\r\n]+ -> skip; // skip spaces, tabs, newlines eol : ';' -> pushmode(noise); // when ';' found, go noise mode skip mode noise; noise_property : 'property' -> type(property), popmode; // when 'property' found, it's property token , go normal mode : .+? -> skip; // skip other stuffs
here parser part (named test.g4
in case)
grammar test; options { tokenvocab=testlexer; } root : property+; property : property (atr equals string)+ eol; // match keyword property followed attribute definition
this should work :)
Comments
Post a Comment